Computer Vision Challenge 4: OCR

This is a challenge we’re working on in the Silicon Valley Computer Vision Meetup.  This challenge is to use OCR to read a receipt. Specifically, this receipt:


Receipt for OCR

We’ll be using an OCR engine called Tesseract. To get started with Tesseract:

1. Install Tesseract using the instructions. Be sure to install the appropriate language training data.

2. Download the full-size receipt image.

3. Enter the command line:

tesseract IMG_2288.jpg out

4. Look at file “out.text”.  You should see (among other things) the text:

Red Restaurant and Bar

Congratulations, you’ve got Tesseract up and running!

Along with the text, you’ll see a lot of garbage.  The next step is to tune Tesseract so that it captures all of the text.

Computer Vision Challenge 3: Play Spot-It™

Our new challenge is to write a program that successfully plays the card game “Spot-It“.

The Game

There are several variations on the game, but the basic Spot-It mechanic is this:

  1. Two circular cards are turned over.
  2. Every pair of cards has precisely one symbol in common.
  3. The first player to point out the common symbol wins the round.

Here is a sample pair of Spot-It cards:Two Spot-It Cards

In this example, the common symbol is a 4-leaf clover.

Suggested Setup

Assume the cards will be laid out side by side, like in the above photo.  Split the input image in half, assuming one card on the left side, and one card on the right.  That way you can use the above photo to develop your algorithm, and then test it with a camera pointed at two real cards.

How to Match Symbols

There are a number of different ways to match symbols.

  • Identify and extract the features for each card, and then find the areas on each card match features for each
  • Extract the contours for each symbol, compute the moments for each contour, and then find the contours with the closest moments. The OpenCV call matchShapes might come in handy.
  • ?

I’ll be focusing on the feature-based approach here.  I’ll post more here later as I work on my solution.

Update: Several members of the meet up have done some amazing things with this.

Soheil Eizadi has solved the problem for the sample image.  His code is available at:

JJ Stiff has gotten really nice outlines of the images. His code is available at:

Computer Vision Challenge 2: Object Tracking

This challenge is much more open-ended than the augmented reality challenge:

Given a somehow-designated object in a scene, track that object as it moves about the scene.

The object could be designated a number of different ways:

1. The largest object moving in the foreground.

2. The object is a different color than the rest of the scene.

3. Designated with some kind of GUI (e.g. click on the object to track).

4. Your idea here.

Similarly, “tracking” can mean a number of different things:

1. Overlay some kind of marker over the designated object in the scene, and move that marker as the object moves.

2. Move a camera to keep the object centered in the field of view.

3. Move a robot so that it follows the designated object without letting it get too close or too far away.

I suggest you start with either tracking the largest moving object in the foreground, or a uniquely colored object using a marker and a fixed camera.

Streaming Video for Pebble

Back in 2012, I participated in the Kickstarter for Pebble, a smart watch that talks to your smart phone via Bluetooth. I was looking forward to writing apps for it. Unfortunately, my first Pebble had a display problem and by the time I got around to getting it exchanged, all the easy watch apps had been written.

I racked my brain for an application that hadn’t already been written. Then it hit me — streaming video! I could take a movie, dither it, and send it over Bluetooth from my iPhone to the Pebble. The only problem was: how would I get the video source?

Then I remembered, “Duh, I just wrote an app for that.” CVFunhouse was ideal for my purposes, since it converts video frames into easier-to-handle OpenCV image types, and then back to UIImages for display. All I had to do was process the incoming video into an image suitable for Pebble display, and then ship it across Bluetooth to the Pebble.

My first iteration just tried to send a buffer of data the size of the screen to the Pebble, and then have the Pebble copy the data to the screen. This failed fairly spectacularly. The hard part about debugging on the Pebble is that there’s no feedback. You build your app, copy it to the watch, and then run it. It either works or it doesn’t. (Internally, your code may receive an error code. But unless you do something to display it, you’ll never know about it.) Also, if your Pebble app crashes several times in rapid succession, it goes into “safe mode” and forces you to reinstall the Pebble OS from scratch. I had to do this several times during this process.

Eventually, I wrote a simple binary display routine, and lo and behold, I was getting errors. APP_MSG_BUFFER_OVERFLOW errors, to be exact, even though my buffer should have been more than sufficiently large to handle the data the watch was receiving. I discovered that there is a maximum allowed value for Bluetooth receive buffer size on Pebble, and if you exceed it, you’ll either get an error, or crash the watch entirely. I wanted to send 3360 bytes of data to the Pebble. I discovered empirically that the most I could send in one packet was 116 bytes. (AFAIK, this is still not documented anywhere.) Once I realized this, I was able to send image data to the Pebble in fairly short order, albeit only 5 scan lines at a time.

All that remained was to dither the image on the iPhone side. From back in the monochrome Mac days, I remembered a name: Floyd-Steinberg dithering. I Googled it, and it turns out that the Wikipedia article includes the algorithm, and it’s all of 10 lines of code. Once I coded that, I had streaming video.

Unfortunately, the video only streamed at around 1 FPS on an iPhone 5. How I got it streaming faster is a tale for another day.

CVFunhouse, a iOS Framework for OpenCV

Ever since I took the free online Stanford AI class in fall of 2011, I’ve been fascinated by artificial intelligence, and in particular computer vision.

I’ve spent the past year and a half teaching myself computer vision, and in particular the open source computer vision library OpenCV. OpenCV is a cross-platform library that encapsulates a wide range of computer vision techniques, ranging from simple edge detection, all the way up to 3D scene reconstruction.

But developing primarily for iOS, there was an impedance mismatch. iOS deals with things like UIImages, CGImages and CVImageBuffers. OpenCV deals with things like IplImages and cv::Mats.

So I wrote a framework that takes care of all the iOS stuff, so you can focus on the computer vision stuff.

I call it CVFunhouse. (With apologies to Robert Smigel).

As an app, CVFunhouse displays a number of different applications of computer vision. Behind the scenes, the framework is taking care of a lot of the work, so you can focus on the vision stuff.

To use CVFunhouse, you create a subclass of CVFImageProcessor. You override a single method, “processIplImage:” (or “processMat:” if you’re working in C++). This method will get called once for every frame of video the camera receives. Your method processes the video frame however you like, and outputs the processed image via a callback to imageReady: (or matReady: for C++).

The callback is important, because you’re getting the video frames on the camera thread, but you probably want to use the image in the main UI thread. The imageReady: and matReady: methods take care of getting you a UIImage on the main thread, and also take care of disposing of the pixels when you’re done with them, so you don’t leak image buffers. And you really don’t want to leak image buffers in an app that’s processing about 30 of them per second!

CVFunhouse is dead easy to use. The source is on GitHub at To get started, just run:

git clone

from the command line. Then open the project in Xcode, build and run.

I’ve now built numerous apps on top of CVFunhouse. It’s the framework I use in my day-to-day work, so it’s constantly getting improved. I hope you enjoy it too.

Your iPhone’s Seven Senses

Humans have five senses. Your iPhone has seven:

  • Touchscreen
  • Camera
  • Microphone
  • GPS (augmented by cell tower and WiFi location)
  • Accelerometer
  • Gyroscope
  • Magnetometer

(The magnetometer is normally used as a compass. But think for a moment — your iPhone can actually sense magnetic fields. That’s something only a few animals can do.)

Now here’s the sad part:

Most of the time we communicate with our iPhones via only one of those senses — touch. Virtually all of our interaction with our iPhones is via touching a screen the size of a business card. We talk with our iPhone like Anne Sullivan talked to Helen Keller.

But the iPhone isn’t blind or deaf. It can see and hear quite well, and it has a better sense of location and direction than most people.

But it’s very rare that apps take advantage of these senses. One of the few that does (other than navigation and photography apps) is the Apple Store app.

Note, I’m not talking about the App Store app, I’m talking about the app you use to purchase Macs and iPhones from Apple. The app that’s normally a friendly front end for the Apple Store website.

But when you run the app while you’re in (or near) an actual physical Apple retail store (like this one in Palo Alto), the Apple Store app gives you a bunch of new options. For example, it knows you’re in an Apple Store, so if you have a Genius Bar appointment there, it automatically checks you in for your appointment, and shows you a picture of the Genius who will be meeting you.

But the coolest thing you can do with the Apple Store app while at an actual Apple Store is self-checkout. You don’t need to find somebody in a blue shirt to help you with your purchase. Instead, you can just grab an item off the shelf, point your iPhone’s camera at its barcode, and enter your iTunes password. Your item is charged to the credit card associated with your iTunes account, and you’re free to walk out the door with it. It’s freaky weird the first time you do it, but also way cool.

And all this is done using just a two of the iPhone’s senses — GPS and camera.

Imagine what you could do with all seven!

Shuttle Launch

So, it’s been a few months since I went to see the shuttle launch.

It was okay.

I guess you could sum up my feelings with that Peggy Lee song “Is That All There Is?” The reason that I went to see the shuttle launch is because of the essay Penn Jillette wrote about it in Penn and Teller’s “How to Play in Traffic”.

“It’s 3.7 miles away, and your looking at this flame and the flame is far away and it’s brighter than watching an arc welder from across a room[….] The fluffy smoke clouds of the angels of exploration spill out of your field of vision. They spill out of your peripheral vision.”

“You don’t exactly hear it at first, it almost knocks you over. It’s the loudest most wonderful sound you’ve ever heard. […] You can’t really hear it. It’s too loud to hear. It’s wonderful deep and low. It’s the bottom.”

“This is a real explosion and it’s controlled and it’s doing nothing but good and it makes your unbuttoned shirt flap around your arms. It’s beyond sound,it’s wind. It’s a man-made hurricane.”

The key point there being, “3.7 miles away”. In the VIP section. I was in closer to 7 miles away, along the NASA Causeway, in the closest section open to to the general public. From there, the Shuttle is a tiny speck without binoculars, and the sound of the launch, when it hits you, is reminiscent of the sound of distant thunder in the midwest. And with the low clouds, the whole show was over in matter of seconds. I could tell you more, but just watch the movie. That’s pretty much what I saw and heard, and I’m nowhere near as good at words as Penn.

Next time, I’m bringing binoculars.