Google introduced a new product “Goggles” some time back. I personally think its awesome. It uses some cool visual search technology but also implements augmented reality features like getting the information of a local business when viewing it through your phone’s camera. Given the state of visual search technology, I am sure this will take some time to come out of beta mode, but it does provide an awesome leap to the technology at present. View a demo at:


A nice compilation of the CVPR 2008 papers available online.

Bill boards have hitherto lacked the capability to provide accurate metrics which is available along with internet advertising these days. However that might soon change, if some companies are to have their way.

An article on NYTimes talks about two companies, TruMedia and Quividi, which are providing solutions to provide metrics for people who look at a billboard. They install billboards with cameras, and then use face recognition to provide age and gender information of the people who looked at the billboard. There is talk of providing the racial information of viewers as well, to further target ads. The goal would be “to show one advertisement to a middle-aged white woman, for example, and a different one to a teenage Asian boy“. Such solutions will provide much needed desired metrics which define the effectiveness of a billboard advertisement.

The ad is equipped with a camera that gathers details on passers-by.

The ad is equipped with a camera that gathers details on passers-by.

A paper from MIT to appear in CVPR 2008 tries to push the frontiers of object recognition. One of the lessons from modern search engines is that very simple algos give remarkable performance by using data on an internet scale. However, to apply such an approach to object recognition is computationally intractable..right from the task of even downloading 80M images to experimenting with this huge dataset.

This is the motivation to find shorter numerical representations that can be derived from an image that will provide a useful indication of its content. A short representation for an image will allow for real time solutions for object recognition tasks, which are otherwise extremely computationally intensive. As described in this news article, it has been shown than representing images as a compact binary code (as few as 256 bits per image) captures the essential contents of an image. With these codes, a database of 12.9 million images takes up less than 600MB of memory. This can easily fit on the memory of a PC. Thus such a representation of images would enable real time searches over millions of images on a single PC. The object recognition results obtained in the paper (with the short codes used to represent an image) are comparable to the results obtained using full descriptors. Because of the information lost by reducing the bits representing an image, complex or unusual images are less likely to be correctly matched. But for the most common objects in pictures–people, cars, flowers, buildings–the results are quite impressive.

Che Guevara

The same project also created a cool app called the visual dictionary. A list of all nouns in the English language was obtained from Wordnet. Images for each word were obtained by Google image search. Each tile in the image above is the average of 140 images. This average represents the dominant visual characteristics of each word. The average could be a recognizable image (as Che Guevara above) or a colored blob. The tiles are arranged such that the proximity of two tiles is indicative of their semantic distance. Thus the poster explores the relationship between visual and semantic similarity.

A recent paper from Google describes a new image search algorithm that ranks images based on their visual similarity. This NYTimes article and this give a good introduction to the algorithm. I will try to give an overview of their approach:

  • They extract local descriptors (SIFT descriptors) on the images.
  • Measure of similarity between two images is defined as the number of interest points (descriptor vectors) shared between the two images divided by their average number of interest points.
  • The similarity between images is considered as probabilistic visual hyperlinks (this is necessary as there are no actual links between the images) and this leads to using the PageRank algorithm for ranking.

The above ranking method can be interpreted as finding multiple visual themes and their strengths in a large set of images and using this for ranking them. An example from the paper is shown below. There are many comic representations of the painting MonaLisa and all of them are based on the original painting. The original painting will contain more matched local features than others (and hence will be rated as having a stronger visual hyperlink). As seen in the image below, the center of the graph contains images corresponding to the original version of the painting.

The authors of the paper above have posted some clarifications about the paper here.

On a similar not, came across a good talk on Image retrieval, especially semantic image retrieval.

Using Statistics to Search and Annotate Pictures -> Gives a brief introduction to image retrieval (query by sketch, query by example) followed by the concept of semantic image retrieval.