OCR This! :-) |
I should point out before I start that I'm using Ubuntu 12.04
So, anyway, I screen-captured the areas of the screen that formed the list into a series of image files. At this point I tried OCRing one of them straight off the bat, using the tesseract OCR.
tesseract input.png output
It didn't go to well, and looked a bit like this:
tH3 i tt s3f 1
5ce eg p e3t s 1nage amd
c0mf 1 1 b@d c tot ck!
Not the most useful.
I figured it couldn't read the images because they were too small or blurry, so I scaled them up to twice the size of the original and reduced the colour depth from 16 million colours to just 2 - black and white.
I tried it again:
tesseract input2.png output2
And this time, it was more like this:
This is a test to see if l can
screengrab text as an image and
convert it back to taxt!
So, still not quite 100% but very, very close. Good enough for my purposes.
To install tesseract, if you are using Ubuntu 12.04 (or similar) -
apt-get install tesseract-ocr
No comments:
Post a Comment