|
Location: Web development - PHP License: The Creative Commons Attribution-ShareAlike 2.5 License Basic OCRPosted by xsist10Building a basic OCR framework |
Skill: AdvancedPosted: 12/01/2009Views: 765Rating: 5.00 /5Popularity: 0.00 |
| Sign Up to vote for this article |
Optical Character Recognition (or OCR) is a method of discerning a pattern from an image. It's oftened used to convert printed or hand written text into machine editable text. This is a simple method using a basic neural network approach and basic supervised learning thrown in for adapting your OCR system.
Neural networks (like a lot of AI methods) imitates nature. A neural network works on layers. The most basic example is an input layer of inputs (1-x possible inputs), a hidden layer which processes the various inputs (1-y possible hidden processes) and an output layer (1 possible output).

Outputs can then be used as inputs for another input layer.
For OCR we use the image of the character as the input layer broken down into a value from 1 (black) and -1 (white), the learnt image of a character (also broken down in the same way) as the processing layer and the output as a value from 0.0 to 1.0 on how well they match.
Here is an image of the character R.

First we'll want to load our image. The easiest way is to use PHP's GD library functions.
$im = ImageCreateFromJpeg($file);
You can use the following code to get the colour of a point in the image.
We then get each colour in the image and convert it into a value from -1 to 1.
The output should look something like this:

Now previously I created a map of the optimal match for the letter 'R' which I stored in the system. The scale map looks something like this.

We then sum the absolute values of the optimal match and get a value:
53.72
What we want to do is determine who close a match the 2 scale maps are. To do this we sum the multiple the sample map and the optimal together to get a map that looks something like this:
Sum(x(i) x y(i))
0.90 x 1.00 + 0.83 x 0.90 + 0.73 x 1.00 + 0.78 x 0.93 +
1.00 x 0.48 + 0.96 x -0.69 + -0.95 x -1.00 + 0.72 x 0.97 +
-0.63 x -0.98 + -0.98 x -0.98 + -0.88 x -0.84 + -0.26 x -0.10 +
0.63 x 0.65 + -0.77 x -0.94 + 0.67 x 1.00 + -0.69 x -0.91 +
-1.00 x -1.00 + -0.95 x -1.00 + -0.82 x -0.85 + 0.90 x 0.90 +
-0.91 x -1.00 + 0.64 x 0.98 + -0.65 x -1.00 + -0.87 x -0.95 +
-0.82 x -0.89 + 0.83 x -0.01 + 0.65 x 0.59 + -0.95 x -1.00 +
0.39 x 1.00 + 0.21 x 1.00 + 0.27 x 0.99 + 0.44 x 1.00 +
0.72 x 0.68 + -0.84 x -0.77 + -0.94 x -0.95 + 0.09 x 0.97 +
0.99 x -1.00 + -0.07 x -1.00 + -0.11 x -0.76 + 0.36 x 0.63 +
-0.49 x -0.41 + -0.99 x -0.95 + 0.47 x 1.00 + -0.76 x -0.95 +
-0.97 x -0.96 + -1.00 x -1.00 + -0.81 x -0.78 + 0.66 x 0.83 +
-0.57 x -0.94 + 0.65 x 1.00 + -0.87 x -0.98 + -1.00 x -1.00 +
-0.98 x -0.99 + -0.91 x -1.00 + 0.69 x 0.37 + 0.84 x -0.25 +
0.88 x 1.00 + -0.84 x -0.99 + -0.98 x -1.00 + -1.00 x -1.00 +
-0.88 x -1.00 + -0.40 x -0.44 + 0.89 x 0.60
= 36.41
Now to get fitness of match we divide the value we get with the optimal match absolute value sum (53.718).
36.41 / 53.72 = 0.68 (68% match)
Now we have a value from 0 to 1 which gives us an indecation on how close the match is. The higher the value, the better.
Now it was a decent match, however we want the match to be a lot stronger (0.8+). To do this we get an average of the original best match and the sample and save that as the new R.
(x(i) + y(i)) / 2 = z(i)
(1.00 + 0.90) / 2 = 0.95
Character recognision is a lot of fun to play around with. Of course I've not covered issues like removing static from scanned images or cutting up sentences into words into letters or correcting page rotations. However this is a good start to creating a system that can take a scan of a page of text and convert it into page readable text.
This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License
| xsist10
| Location: |
Sign up to post message on the article message board!