intelliproject logo

Location: Web development - PHP    License: The Creative Commons Attribution-ShareAlike 2.5 License

Basic OCR

Posted by xsist10

Building a basic OCR framework

Skill: Advanced

Posted: 12/01/2009

Views: 765

Rating: 5.00 /5

Popularity: 0.00

Sign Up to vote for this article

Introduction

Optical Character Recognition (or OCR) is a method of discerning a pattern from an image. It's oftened used to convert printed or hand written text into machine editable text. This is a simple method using a basic neural network approach and basic supervised learning thrown in for adapting your OCR system.

Quick Overview of Neural Networks

Neural networks (like a lot of AI methods) imitates nature. A neural network works on layers. The most basic example is an input layer of inputs (1-x possible inputs), a hidden layer which processes the various inputs (1-y possible hidden processes) and an output layer (1 possible output).

Neural_network_example.png

Outputs can then be used as inputs for another input layer.

For OCR we use the image of the character as the input layer broken down into a value from 1 (black) and -1 (white), the learnt image of a character (also broken down in the same way) as the processing layer and the output as a value from 0.0 to 1.0 on how well they match.

Code

Here is an image of the character R.

R

First we'll want to load our image. The easiest way is to use PHP's GD library functions.

$im = ImageCreateFromJpeg($file);

You can use the following code to get the colour of a point in the image.

We then get each colour in the image and convert it into a value from -1 to 1.

The output should look something like this:

OCR6.jpg

Now previously I created a map of the optimal match for the letter 'R' which I stored in the system. The scale map looks something like this.

OCR5.jpg

We then sum the absolute values of the optimal match and get a value:

53.72

What we want to do is determine who close a match the 2 scale maps are. To do this we sum the multiple the sample map and the optimal together to get a map that looks something like this:

Sum(x(i) x y(i))

0.90 x 1.00 + 0.83 x 0.90 + 0.73 x 1.00 + 0.78 x 0.93 +
 1.00 x 0.48 + 0.96 x -0.69 + -0.95 x -1.00 + 0.72 x 0.97 +
-0.63 x -0.98 + -0.98 x -0.98 + -0.88 x -0.84 + -0.26 x -0.10 +
0.63 x 0.65 + -0.77 x -0.94 + 0.67 x 1.00 + -0.69 x -0.91 +
-1.00 x -1.00 + -0.95 x -1.00 + -0.82 x -0.85 + 0.90 x 0.90 +
-0.91 x -1.00 + 0.64 x 0.98 + -0.65 x -1.00 + -0.87 x -0.95 +
-0.82 x -0.89 + 0.83 x -0.01 + 0.65 x 0.59 + -0.95 x -1.00 +
0.39 x 1.00 + 0.21 x 1.00 + 0.27 x 0.99 + 0.44 x 1.00 +
0.72 x 0.68 + -0.84 x -0.77 + -0.94 x -0.95 + 0.09 x 0.97 +
0.99 x -1.00 + -0.07 x -1.00 + -0.11 x -0.76 + 0.36 x 0.63 +
-0.49 x -0.41 + -0.99 x -0.95 + 0.47 x 1.00 + -0.76 x -0.95 +
-0.97 x -0.96 + -1.00 x -1.00 + -0.81 x -0.78 + 0.66 x 0.83 +
-0.57 x -0.94 + 0.65 x 1.00 + -0.87 x -0.98 + -1.00 x -1.00 +
-0.98 x -0.99 + -0.91 x -1.00 + 0.69 x 0.37 + 0.84 x -0.25 +
 0.88 x 1.00 + -0.84 x -0.99 + -0.98 x -1.00 + -1.00 x -1.00 +
-0.88 x -1.00 + -0.40 x -0.44 + 0.89 x 0.60

= 36.41

Now to get fitness of match we divide the value we get with the optimal match absolute value sum (53.718).

36.41 / 53.72 = 0.68 (68% match)

Now we have a value from 0 to 1 which gives us an indecation on how close the match is. The higher the value, the better.

Strengthing the Optimal

Now it was a decent match, however we want the match to be a lot stronger (0.8+). To do this we get an average of the original best match and the sample and save that as the new R.

(x(i) + y(i)) / 2 = z(i)
(1.00 + 0.90) / 2 = 0.95

Conclusion

Character recognision is a lot of fun to play around with. Of course I've not covered issues like removing static from scanned images or cutting up sentences into words into letters or correcting page rotations. However this is a good start to creating a system that can take a scan of a page of text and convert it into page readable text.

References

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License

About the author

xsist10

Location: South Africa

Sign up to post message on the article message board!