Applying to college, it turns out, is a lot like good software development. The standard, recommended way of doing things is to lay down your path first, figure out what you ultimately want as an end result, and design the program or list of colleges before beginning the development (application process). And, if the route I am taking and most of my friends are taking are any indication, it’s also a lot like programming in that most people don’t really focus on the design and end up with a program that has way too many features, or a college list with 18 colleges on it.
Don’t worry though, since this particular post isn’t going to focus on college applications. It just happens to turn out that the poor design of a college list is a good analogy for code optimization. A bit of background first though. The average amateur programmer knows what code optimization is but doesn’t consider it important to them or their code. Until Project Euler, I was the same way. Project Euler, for those who don’t know, is a mathematics and programming challenge which presents relatively simply mathematical concepts (which get more complex as the problems do), and challenges a user to solve them. There is no need to use a specific programming language, or for that matter even to use a programming language, though most of the problems require it. In addition to being a great way to learn to program or refine programming skills, it is also a very fun challenge.
In order to not spoil the fun of solving the more advanced problems here, I’m discussing Problem #3. I’m not posting the answer, but I am discussing source code and optimization methods, so have a crack at it yourself if you want before reading on.
Problem #3 involves prime numbers, so it is useful to begin by reviewing the prime number rules briefly.
- Prime Number: A natural number whose only distinct natural factors are 1 and itself.
- Composite Number: A positive integer which has at least one positive factor other than 1 or itself.
- 1 is neither a prime nor composite number.
- Any composite number can be broken down into key prime factors. For example, 6 is 2*3, and 150 is 2*3*5*5.
- When searching for prime factors, we don’t care about the exponents. So while 150′s prime factorization is 2*3*5^2, it’s prime factors are simply 2, 3, and 5.
That might be even more information then necessary for this problem, but it is basic prime factor information and pretty light mathematically, so it’s good to cover here. Next up, some simple programming terms I’ll be using throughout this post:
- Programming Language: A format for writing source code, which is interpreted or compiled by the computer
- Algorithm: A method of doing something using an instruction sequence. Not limited to computer science in this definition, stretches as far as psychology.
- Efficiency: A measure of how quickly and simply an algorithm runs when scaled
- Brute-Force Algorithm: An algorithm in computer science which involves trying every possible solution to determine if it is correct. It is not necessarily the least efficient method, though it is often less efficient then other algorithms.
The programming language I’m using to solve this problem is Python, which I’ve mentioned before. Python is an easy-to-use programming language that can be learned quickly but applied to many advanced tasks, and it’s also the most common reported language used on Project Euler problems. I’ll explain my code, but since Python emphasizes code readability, it should make sense.
Okay, so finally, on to Problem #3. The problem asks to find the largest prime factor of the number 600851475143. Finding prime factors would involve breaking down the problem into two simple components: generating prime numbers and finding prime factors for a composite number.
The brute force method for finding prime numbers is taking a number, dividing it by every lower number except 1, and returning true if no numbers divide evenly into it. We continue to cycle through prime numbers until an exit condition is met, such as having found all primes below a given limit or having found the next prime in a sequence.
For this problem, the brute-force method of finding primes is sufficient and in fact should function rapidly. I may revisit this later to discuss more advanced mathematical ways of finding primes, including using a Sieve of Eratosthenes for a given upper bound, though the math behind that is a bit more complex then I intend this post to be. So, here is the code for the simple prime number generator using the brute force method:
# nextPrimen>(n) finds the next prime after n
def nextPrime(n):
val = n + 1
while(True):
# Check if val (n + 1) is a prime
if checkPrime(val):
return val
else:
val = val + 1
# checkPrime(n) returns true if n is a prime number
def checkPrime(n):
x = 2
while(x < n):
if n % x == 0:
# Factor other then 1 or n, number is composite
return False
x = x +1
# Number is prime, while loop terminated without finding factor
return True
This code should be pretty straight-forward, and it’s clearly a brute-force algorithm for generating primes. For the prime factor finder method though, the brute force algorithm involves dividing the number n by every prime below it. Since we’ve already spent a lot of time on finding the primes, checking every single prime number would be ridiculous and take forever. So we need to find a faster way to do it.
This is where optimization steps in, and where Project Euler begins to challenge the problem-solver to think outside of the box. The brute-force solutions are simple and straightforward, but for the large values in problems, such as 600851475143, the brute-force algorithm would take a long time to complete. Project Euler states that any of their problems can be simply solved by the computer in under a minute (that is, that while creating the program may take hours or even days of planning and thinking and writing, the code itself should execute in under a minute). As hackers working on an easy problem, we also understand that the most elegant solution is one which is both efficient and simple.
Finding the more optimized solution requires thought and consideration as to how efficient the solution will be. In computer science, algorithmic efficiency is often expressed using Big-O Notation, where the order of the algorithm is expressed in terms of n, n being the number that will be scaled in the problem. It is important before using Big-O Notation to know what n actually represents. In a server program, it might represent number of users, for example. It is always some value we expect to scale, though. In our case, n is the constant we have to break down: 600851475143. The constant they broke down for us is 13195, which when compared to 600851475143 is a very low number. So an efficient algorithm should not take a huge amount longer to check 600851475143 then it does 13195.
I won’t show detailed Big-O Notation, better to save that for another day. However, it can basically be broken down as a way of describing how a given algorithm scales. Linear order would increase by a constant amount as n increases, for example, while exponential order would increase rapidly as n got larger and larger, and logarithmic order would increase slowly over large changes in n. Constant order would not be affected by the value of n, such that the time to process a small n would be the same as the time to process a large n. We already know that our brute force algorithm also has to take into account the brute force algorithm used by our prime generator, which is of a quadratic order—not perfect, but fine for now. Checking every prime below n would take a while, so it’s time to start brainstorming some ways to improve our method.
First, there is a rule that for any number n, the only prime factor greater than the square root of n is n, or in other words, all prime factors other than the number itself are below the square root of that number. This could be used to our advantage, but all it does is change what n is, not the efficiency of the code (meaning for very large numbers like ours, even though the square root is much lower, the code still won’t run fast enough). When stuck behind an optimization problem, often the best thing to do is discuss the program efficiency with someone else. I discussed this program with Jonathan at The Twilight Sun (see my sidebar), and we decided that the best method would be to constantly break down the number while processing it:
# Returns prime factors of n. Returns an empty list if n is prime.
def factors(n):
factorList = []
prime = 1
while (nextPrime(prime) <= n):
while n % nextPrime(prime) == 0:
n = n / nextPrime(prime)
factorList.append(nextPrime(prime))
prime = nextPrime(prime)
return factorList
I know that this code is a lot less self-explanatory then my earlier code, but it should still seem pretty simple. Every time we find a prime factor of n, we divide n by that prime factor, and start looking for the prime factors of this new, divided number. So, with our example prime of 13195, it will find the factor 5 and n becomes 2639 (13195/5). The while loop repeats over the same prime (note that we haven’t changed the starting prime yet, and therefore haven’t changed the number we are dividing by) to try and break it down even further, which will fail because 2639 does not divide into 5 evenly (without a remainder). Now it continues with n as 2639 and finds that 7 works, making n 377. The next functional value is 13, which makes n 29, which is our final prime factor.
We will always end with our greatest prime factor, so we could just return the new n. What’s more, we can even further optimize this problem by throwing in the square root rule, which works to get us n even faster. So here is the final, super-optimized greatest factor finder.
import math
# Prime Factor Generator Code is above
# Returns greatest prime factor of n, other then n itself.
def greatestPrime(n):
prime = 1
while (nextPrime(prime) <= math.sqrt(n)):
while n % nextPrime(prime) == 0:
n = n / nextPrime(prime)
prime = nextPrime(prime)
return n
As expected, this code calculates greatestPrime(600851475143) super fast. However, it is important to note that the final value will have an ‘L’ at the end of it. This is Python’s way of informing us that the value is stored as a long number. Even though our final value isn’t a huge number, the value we started with (600851475143) is too big to be stored as a simple integer. Python handles the data type for us (dynamic typing) but doesn’t change it (strong typing).
Some final closing notes regarding both this problem and the blog. First off, this post was way longer then I would have liked, so I expect to be doing shorter posts in the future, and breaking long topics like code optimization down a bit. Second, I am aware that I could use a library for prime number generation, and I’m sure the library code is more elegant than mine. I opted not to here in order to best demonstrate the process without using libraries, and I recommend others going through Project Euler keep this advice in mind: while it is true that one hacker mantra is that no problem should be solved more than once, it is also important that before using someone else’s code you fully understand how and why it works, even if you could not have developed it on your own. In this way, you learn to read and write code, and not simply to use the tools available but to understand how they work. Finally, I used while loops everywhere, including some places where for loops seemed to make more sense. A for loop in most languages is a while loop given three statements, one the standard conditional of the while loop, another the initial value for a variable in the while loop, and the third an increment or operation to be performed at its completion. Python does not use for loops in this way: instead they parse through a list of values, thereby serving a function which cannot be emulated with while loops, sticking with the Python ideal of having only one right way to do something. While for loops can be used like one would expect with the range() function, I felt that using a while loop made more sense in terms of memory.
So again, I plan to make this shorter next time. Hope you learned something!
PS: I’m very interested in your thoughts on this post. Particularly: what is your experience level and how did this post feel relative to that (i.e. was the programming language description worthless or was Big-O notation too complex, I kind of covered a broad range here), what did you like and what did you hate about this post, what do you want to see more of from me (tutorials, programming experiences, hacker culture stuff outside of programming), etc. I’d love to hear from anyone reading!
By: Dylan on September 30, 2009
at 7:35 pm
What the hell is this? You stole this from wikipedia! Jerk.
By: The Mass Effect on October 8, 2009
at 8:58 am
I most certainly did not. As I mentioned in my first post, I won’t censor non-spam comments, but I have to wonder where you got the idea that anything in my post whatsoever was from Wikipedia. All I could find on Wikipedia regarding Project Euler was a 1st problem solution. None of their information on brute-force searching was specific and the pseudocode simply explained the idea. If you’re referring to the overall idea of code optimization, well, that’s about as ridiculous as saying a physicist stole their Newtonian mechanics work from Wikipedia just because Wikipedia has an article on the topic.
I didn’t use Wikipedia at all to write this article, so I’d love to see the Wikipedia article which you find to be similar.
By: Dylan on October 8, 2009
at 10:01 am
BS, I entered ur text into google, it is on wikipedia. Why you so unprofessional, where’s ur intellecual integrity biotch.
By: The Mass Effect on October 9, 2009
at 11:37 am
Again, link? I tried that with my text (which I didn’t plagiarize), and nothing comes up except my own blog. If this really is on Wikipedia, whoever posted it there probably took it from me, so I would love a link. Except, of course, it’s not really on Wikipedia anyway.
Oh, and in case you haven’t noticed, online, degree of professionalism is usually initially indicated by the way one types. So using “ur” twice, “BS” once, and “Why you” probably says more about you then it does about me.
@Everyone Else: I’m not sure why people enjoy doing this type of stuff (posting pointless spammy comments, vandalizing wikis, etc). Anyone have some insight? He/she seems to be wasting a lot more of his/her life then mine.
By: Dylan on October 9, 2009
at 4:59 pm
Actually, I would say that you are wasting more time than they are. They are posting quick and poorly written posts while you appear to be putting in much more time and effort into your posts as they are longer and better written.
It’s possible that this is their goal. Throw a quick comment at you, with the intent of messing with you, and hope that you will over analyze it and spend a considerable amount of time to come up with a good reply.
By: Cat Lover on October 22, 2009
at 9:03 pm