Computer Vision Challenge 4: OCR

This is a challenge we’re working on in the Silicon Valley Computer Vision Meetup.  This challenge is to use OCR to read a receipt. Specifically, this receipt:

 
Receipt for OCR

Receipt for OCR

 

We’ll be using an OCR engine called Tesseract. To get started with Tesseract:

1. Install Tesseract using the instructions. Be sure to install the appropriate language training data.

2. Download the full-size receipt image.

3. Enter the command line:

tesseract IMG_2288.jpg out

4. Look at file “out.text”.  You should see (among other things) the text:

SANTA CRUZ HOTEL
Red Restaurant and Bar

Congratulations, you’ve got Tesseract up and running!

Along with the text, you’ll see a lot of garbage.  The next step is to tune Tesseract so that it captures all of the text.

John Brewer