|
Location: Computer science - Algorithms License: The Intelliproject Open License (IPOL) Measuring a Program's EfficiencyPosted by Bryan ChenAnalysing a program's/algorithm's efficiency |
Skill: IntermediatePosted: 04/12/2008Views: 638Rating: 5.00 /5Popularity: 0.00 |
| Sign Up to vote for this article |
The problem of inefficiency is more concerning today than ever. The fact that the barrier to creating web programs and applications is even lower today than it was two decades ago has done much to worsen the problem.
The fact is that the typical high school 'freelancer' doesn't understand the concept of efficient algorithms in their programs. Now, I must admit that I was once like that, oblivious to the lacuna in my knowledge of programming. Whenever I was faced with a problem to tackle programmatically, I'd just create a workaround. The truth is I wasn't alone. Many people would rather create workarounds than to do substantial research in order to solve an algorithmic problem.
This article will help the lay-person understand the basics of the analysis of code efficiency and possibly present tested solutions to common problems.
Now, say we have the following piece of code:
How could we measure something like this? Well, first of all we can never definitely say for sure how long, in milli- or even nano-seconds it would take for the piece of code to run, as it will differ depending on the machine you are running it on.
Furthermore, upon closer inspection, we can see that apart from the loops, the only useful work done in the code is temp + (i * j). This is called a basic operation. In many cases, basic operations are mathematical expressions or variable assignments, but could be other kinds of statements as well.
Finally, we realise that the number of times our basic operation is run depends entirely on the value of n. If n = 5, our code would perform the basic operation exactly 25 times. Similarly, if n = 10, the basic operation would be executed 100 times.
Therefore, the time required to run the code above is dependent on the machine, the number of basic operations that have to be performed on each iteration, and the number of iterations, which in this case is itself is dependent on the size of n.
Now look at the code again - notice that the first loop, or outer loop, runs n times. The nested loop, or inner loop, runs n times for each time that the outer loop runs. This means that in total, the basic operation would run n*n times, or simply, n2. The polynomial n2 is actually the order of growth, which means that the algorithm grows at a quadratic rate when n is increased.
When measuring the efficiency of algorithms, we generally attempt to measure the order of growth of the algorithm. This is because the actual running time is, as described above, dependent on factors that are beyond our control or parameters that we do not know at design time. In the above example, the algorithm belongs to the class of algorithms that have a quadratic order of growth. There are, obviously various other orders of growth, and the table below shows some of them, together with their english-friendly names.
| Order | 1 | log2n | n | nlog2n | n2 | n3 | 2n | n! |
| Known also as | Constant | Logarithmic | Linear | n-log-n | Quadratic | Cubic | Exponential | Factorial |
As you can see from above, every order of growth is affected by the size of n, except for the first, which is known as constant time. Of course, not everything can be achieved in a fixed amount of time - and by fixed I mean that no matter what the size of n is, the program performs the same number of basic operations.
If it isn't apparent to you yet why most algorithms cannot run in constant time, let me try to convince you. Think of an array of size n containing integers that are in no particular order - random, as you call it. Now visualise some code in your favourite programming language that you could write to sort the array in an increasing or decreasing order, your choice. I know many of you will be saying that you'd just call a sort method/function provided by a library. If that is the case, I strongly urge you to read up on the sorting technique used by that library.
I'm sure if you tried writing some code to sort the array, you'd probably end up with an algorithm that ran in n2 time at least. In fact, you probably came up with a sorting technique called bubble sort or if you were more meticulous, you would have come up with selection sort or insertion sort. Whatever the case, all these algorithms are of the quadratic order of growth, n2. Is there a more efficient method of doing sorting? Sure there is - quicksort and mergesort have a better efficiency class, nlog2n. If you have the time, I encourage you to read up the abovementioned sorting algorithms on Wikipedia or something.
Now, having a sorted array after all that hooha, you'd like to know whether a certain integer value is stored in the array. Say your array now looks like this:
| 1 | 2 | 5 | 8 | 10 | 15 | 18 | 20 | 22 | 23 |
Although you can look and tell me in a heartbeat that the integer 2 is certainly one of the elements of the array, the computer can't. To illustrate this fact, go and write some code that finds whether a given integer exists in the array.
Again, like the example above, if you rushed through the exercise, you'd probably come up with a technique that iterated through each element of the array to check whether the given integer matches each element of the array. That technique is linear, or n. In other words, this particular technique would, in the worst case where the integer to be found is at the end of the array, iterate through all n elements, thus executing the basic operation n times. Even if you were a bit more prudent and "told" the loop to stop immediately after it successfully matches the element, there will always be the possibility that the element that matches is the last, or that there are no elements that match in the first place. Okay, I probably lost you by now.
Now if you sat there and spent some time considering the problem of finding (or searching) for an element in the array, you'd realise that because the array is already sorted, you can employ a simple nifty technique. If you were trying to search for the integer 2 in this case, you could start in the middle of the array, and compare that element with 2, to see if you should be looking at the half containing the smaller integers or the half containing the larger integers.
If you compared 2 with the middle element 10 then you'd immediately know that 2, if it exists, would reside in the left half of the array. So you can completely ignore the right half, essentially dividing the problem in two. You'd then simply repeat the steps, comparing 2 with the element in the middle of the left half, which in this case is 5. Again, you'd know that 2, if it exists, would be in the left half of that, again dividing the problem set in two.
If you kept doing the above, you'd eventually end up with the element you were searching for, or know that it doesn't exist. Since each time you were dividing the problem set into two parts, the technique's efficiency is log2n, or logarithmic in nature. This technique is also known as binary search, and it is the fastest search known. The only caveat is that you can only use it on a sorted array, and the most efficient sorting technique is in nlog2n.
Now that you know the most fundamental characteristics and methods of analysing an algorithm's efficiency, you should be able to write more efficient code. Whenever you're faced with a problem, don't go reaching for the first solution that comes to mind. Spend a few minutes to think the problem through before deciding on the best way to approach it.
We know that if we have a nested loop that is dependent upon a sentinel n, then the algorithm probably is in n2 efficiency. We have also learned that the most efficient sorting method is still nlog2n, and given a sorted sequence of elements, we can search for a particular element in log2n. We know that these are very most likely the limits we can reach for problems of those types.
As a programmer, you must be aware of the limits of computational power. You have to know whether an algorithm could be made more efficient, or whether a given problem has no efficient solution, so that time is not wasted attempting to find a more efficient one.
There are also many times where the efficiency is ammortised. For instance, if we had to only load our array once, and then we'd be performing a billion searches after that, even if we used bubble sort to sort the array, we'd have effectively "spread" the inefficiency over the billion searches that we could perform using binary search. In other words, sometimes a solution that is inefficient and that does not have a more efficient solution can still be useful as its inefficiency is necessary for the "better good".
This article, along with any associated source code and files, is licensed under The Intelliproject Open License (IPOL)
| Bryan Chen
| Graduated in 2011 with a Bachelor's Degree (with honours) in Computer Science from the National University of Singapore. I also hold a Diploma in Electrical & Computer Control Engineering and a Diploma in Law from the University of London. My first programming language was Perl, but over the years I have become more proficient with Java, PHP and C#. Functionally, I can do VB.NET, C, C++, Prolog, Haskell, Scheme, Python, Javascript and Ruby. Location: |
Sign up to post message on the article message board!