Thursday, June 19, 2008

The Art of analysis and problem solving

I have been wondering about the art of analysis and problem solving and decided to do a writeup using an example. A lot of developers tend to jump to writing code and never seem to bother about thinking out their solutions. For them if it works its fine. Moving away from this is a pre-requisite for anyone who wants to hone his development skills. In order to convey my point, i take up a seemingly simple example which displays how to analyze a solution before implementing it.

Problem Statement

Write a function which removes a set of characters from a given String. The signature of the function should be as follows

public String removeCharsFromString(String source, String remove)


Problem Analysis
The problem involves the following 2 steps
1) For each character in source, determine whether it needs to be deleted
2) Delete the character

Lets start with the actual deletion process first. You have to remove an element from an array. An array is a contiguous block of memory, so you cant remove an element from the middle as can be done with a linked list. So you have to rearrange the elements to maintain it as a contiguous block. For example if you need to remove c from an array containing abcd , you have to rearrange the elements. So you can either shift ab up or shift d down. In addition you need to decrease the size of the string by one.

Now how would the proposed algorithm work if you had to delete all elements from the source string. If the String is n characters long, this would mean shifting the last element by n-1, the second last by n-2 and so on, giving the worst time ass O(n^2).

How can we avoid this. What if you allocated a temporary string buffer and built your modified string there instead of in place? Then you could simply copy the characters you need to keep into the temporary string, skipping the characters you want to delete. When you finish building the modified string, you can copy it from the temporary buffer back into str. This way, you move each character at most twice, giving O(n) deletion. However, you’ve incurred the memory overhead of a temporary buffer the same size as the original string, and the time overhead of copying the modified string back over the original string. Is there any way you could avoid these penalties while retaining your O(n) algorithm?

To implement the O(n) algorithm just described, you need to track a source position for the read location in the original string and a destination position for the write position in the temporary buffer. These positions both start at zero. The source position is incremented every time you read, and the destination position is incremented every time you write. In other words, when you copy a character you increment both positions, but when you delete a character you increment only the source position. This means the source position will always be the same as or ahead of the destination position. Once you read a character from the original string (that is, the source position has advanced past it), you no longer need that character - in fact, you’re just going to copy the modified string over it. Because the destination position in the original string is always a character you don’t need anymore, you can write directly into the original string, eliminating the temporary buffer entirely. This is still an O(n) algorithm, but without the memory and time overhead of the earlier version.

Now to the actual part of deciding whether a character needs to be deleted. The easiest way is to compare all characters in the remove string with all the characters in the source string. If the size of the source string is n and the size of the remove string is m, this entails a running time of O(nm). How can we reduce this to a running time of O(m).

If we construct an array or a hashtable which has a constant time lookup, we can reduce the running time to O(m).
Lets construct an array containing all Booleans which is indexed by all the possible values of the characters in the source string. This enables you to determine whether a character is in remove by checking a single element.

Now the function will have 3 parts
  • Set all elements in the lookup array to false
  • Iterate through each element in remove, setting the corresponding value in the lookup array to true
  • Iterate through str with a source and destination index, copying each character only if its corresponding value in the lookup array is false
Now if we combine all the operations, the running time would be O(n+m) which is still linear.

Now lets take a look at the code


public String removeCharOccurances(String source,String remove){
char[] sourceArr = source.toCharArray();
char[] removeArr = remove.toCharArray();
boolean[] flags = new boolean[128]; // assumes ASCII!
int len = sourceArr.length;
int len1=removeArr.length;
int src, dst;

// Set flags for characters to be removed
for( src = 0; src < len1; ++src ){
flags[removeArr[src]] = true;
}
src = 0;
dst = 0;

// Now loop through all the characters,
// copying only if they aren't flagged
while( src < len ){
if( !flags[ (int)sourceArr[src] ] ){
sourceArr[dst++] = sourceArr[src];
}
++src;
}

return new String( sourceArr, 0, dst );
}

The emperor and me beaching

The Devil next door

Kaiser The Emperor