The Descriptive Camera works a lot like a regular camera — point it at subject and press the shutter button to capture the scene. However, instead of producing an image, this prototype outputs a text description of the scene. Rather than using some kind of advanced AI that can read images, it simply passes the description task to Amazon’s Mechanical Turk system, which in turn distributes the job to humans. Between 3 to 6 minutes after taking the pictures – slightly longer than a Polaroid takes to develop a picture – the Descriptive Camera spits out the description in a piece of paper.
The Amazon Mechanical Turk, for those who are unfamiliar, is a crowdsourcing Internet marketplace that enables developers (known as Requesters) to submit Human Intelligence Tasks (HITs) for workers on the internet to complete, tasks that computers are unable to perform such as choosing the best among several photographs, writing product descriptions, or describing the content of an image.
The camera is powered by an embedded Linux platform (BeagleBone) and takes pictures using a USB webcam which is attached to a thermal printer for printing the output. A series of Python scripts define the interface and bring together all the different parts from capture, processing, error handling, and the printed output. The device connects to the internet via Ethernet and gets power from an external 5 volt source.
After the shutter button is pressed, the photo is sent to Mechanical Turk for processing and the camera waits for the results. A yellow LED indicates that the results are still "developing". With a HIT price of $1.25, results are returned typically within 6 minutes and sometimes as fast as 3 minutes. The thermal printer outputs the resulting text in the style of a polaroid print.
Checkout the creator’s blog for sample output from the Descriptive Camera.