Computer Vision Challenge 4: OCR

Sep 25

This is a challenge we’re working on in the Silicon Valley Computer Vision Meetup. This challenge is to use OCR to read a receipt. Specifically, this receipt:

We’ll be using an OCR engine called Tesseract. To get started with Tesseract:

1. Install Tesseract using the instructions. Be sure to install the appropriate language training data.

2. Download the full-size receipt image.

3. Enter the command line:

tesseract IMG_2288.jpg out

4. Look at file “out.text”. You should see (among other things) the text:

SANTA CRUZ HOTEL
Red Restaurant and Bar

Congratulations, you’ve got Tesseract up and running!

Along with the text, you’ll see a lot of garbage. The next step is to tune Tesseract so that it captures all of the text.

John Brewer

Computer Vision Challenge 4: OCR

Here we go again...

Computer Vision Challenge 3: Play Spot-It™