Skip to end of metadata
Go to start of metadata
Unknown macro: {center}

Assignment 4 - CBIR

by Bogo Giertler


The objective of the assignment was to create a CBIR (Content-Based Image Retrieval) system. This type of system allows the user to search for an image based on an image, not text query. A response would be a set of the best matching pictures, which hopefully, will match, at least partially, what we see on the picture. The techniques we used are actually similar to these of advanced Google Images Search, where the search is carried based on a particular colorization of the image. Results of a CBIR system, could actually be used in an image completion/content-based editing systems, where finding the most similar images is necessary to complete the picture or transform it appropriately.

In this project, the user provides an image through a command-line interface, and receives a contact sheet containing the pictures, matching pictures and their scores and names as a response picture. A Cocoa interface was planned, but not completed, unfortunately.

Theory behind

All of our techniques in this assignment rely on the concept of a three-dimensional histogram, that allows us to represent the colorization of the image for the given color space (i.e. red, green and blue for RGB or lightness and color dimensions a and b for LAB). Then, using distance metrics we establish the affinity between pictures. We can use the L1 (Manhattan) distance, L2 (Euclidean) distance and histogram intersection). Results vary depending on, expectably, color spaces we operate in - for example Lab is a better approximation of the human, intuitive colorspace than the RGB scheme, where color change and flow is not natural.


In this assignment we are dealing with a database containing well over a thousand pictures, so the speed of computation was crucial for solving the assignment. Since for our calculations we tend to just grow the already established database of pictures, with constant histograms, the most reasonable solution was using Python module known as shelves. Shelves are hash databases stored on a hard drive, that provide a sort of persistence framework for a Python app, allowing for a simplified storage and data management. With a live database that could be easily updated (by rerunning query on a specific item) and updated (by rerunning query on an updated folder), the shelf records were used for storing the RGB histograms of the images. Later, the images were actually split into 4x4 matrix for a better partial analysis. Since the sum of RGB histograms of chunks is equal to the total image histogram, it was necessary only to provide a (4x4)x16x16x16 matrix. (while 4x4 could be represented as 16 matrices, 4z4 allowed for an easier human access to the chunks in relation to the general picture)

With these methods employed, upon calculating the histogram intersection of the complete picture, we can establish the general connection between the presence (but not spatial distribution) of the colors in the particular picture. For that, a 4x4 matrix was employed and an intersection score was calculated for every single of them. Later, the four central scores are averaged (as it can be safely assumed, that the object of interest is probably in the center of focus) and assigned a weigh of 50%. The remaining twelve outliers is assigned the remaining 50% of score. Thus, each of the blocks in the center of a picture (possibly containing the object of focus) changes the score by 12.5%, while the outliers (probably unimportant for the picture) influence the score only by 4% (i.e. center is three times as important as sides)

A rough GUI was created instead of a PyObjC app, that was supposed to allow interfacing with the application. PyObjC might still be used in Mia and mine final project however. Instead, the Tiles assignment was partially recycled to display the pictures as tiles. Text notation and color coding (to establish the validity of picture) was added as well.


The prescribed set of pictures was created - the dominant object in 12 pictures was an orange and a set of 38 pictures with labels was created. It is available for public download.
The algorithms employed seemed to work best with the pictures containing large areas that are significantly constant - i.e. blue sky or green grass will very often produce similar results, but probing for the basket ball might yield photos that resemble background more than the ball itself. The results varied from mediocre to excellent, and here are the results as compared to the common baseline. All these tests were run with a 4x4 image split and weighting.

  • pic.0953.jpg
    • pic.0250.jpg - 0.43
    • pic.0248.jpg - 0.43
    • pic.0954.jpg - 0.42
    • pic.0946.jpg - 0.37
    • pic.0251.jpg - 0.37
    • pic.1234.jpg - 0.31
    • pic.1068.jpg - 0.29
    • pic.1229.jpg - 0.28
    • pic.0121.jpg - 0.26

  • pic.1215.jpg
    • pic.1190.jpg - 0.36
    • pic.1263.jpg - 0.35
    • pic.1283.jpg - 0.33
    • pic.1212.jpg - 0.31
    • pic.0217.jpg - 0.30
    • pic.0557.jpg - 0.29
    • pic.0992.jpg - 0.29
    • pic.1028.jpg - 0.29
    • pic.0629.jpg - 0.29

  • pic.0002.jpg
    • pic.0864.jpg - 0.32
    • pic.0199.jpg - 0.32
    • pic.0001.jpg - 0.30
    • pic.1062.jpg - 0.29
    • pic.0217.jpg - 0.28
    • pic.0583.jpg - 0.28
    • pic.0909.jpg - 0.28
    • pic.1168.jpg - 0.28
      (double occurs due to having 2 replace 420, a missing image)

  • pic.0135.jpg
    • pic.0123.jpg - 0.49
    • pic.0368.jpg - 0.47
    • pic.0688.jpg - 0.44
    • pic.1140.jpg - 0.42
    • pic.0367.jpg - 0.40
    • pic.0680.jpg - 0.40
    • pic.0500.jpg - 0.39
    • pic.0719.jpg - 0.39
    • pic.0142.jpg - 0.37

  • 13.jpg
    • 14.jpg - 0.25
    • 37.jpg - 0.19
    • 36.jpg - 0.19
    • 12.jpg - 0.15
    • 38.jpg - 0.12
    • 33.jpg - 0.12
    • 34.jpg - 0.11
    • 8.jpg - 0.11
    • pic.0921.jpg - 0.11


As we can see, deciding factor in the image selection are the colors (and, with additional 4x4 weighting, also their spatial arrangement in a way) - thus, with dull, gray images (or images without significantly different color components or dominant colors) we might run into an issue of picking the images coloristically similar, but with completely different content. Good example is the pic.0002 (which was my first run and was very disappointing), where although we pick some similar pictures, majority is simply similar in the presence of grays.
We could see some excellent performance with pic.0953.jpg, where the algorithm picked all the pictures with grass - it actually even picked some with our squirrel. Also pic.1215.jpg performed well, but that might be also an issue of selecting all the pictures with steel sky and some green/grays (buildings).
The best performance was with 13.jpg, where the system almost flawlessly selected all the pictures of orange in the database. Undeniable is however, that these pictures had the orange in significant focus, thus weighing the central four squares had to yield very positive results.

Since some of the pictures were rotated, an additional improvement would be providing the histogram checks also for the images rotated by 90, 180 and 270 degrees, as it is possible that the object of interest would lie somewhere else within the rotated picture. Interesting would be also integrating some thresholding properties, i.e. moving the channels slightly left or right, to make sure we are not accidentally skipping some colors - however, the 16 bins already alleviate this problem to a large extent.

Lastly, I run into issue with the shelves module on 64-bit Python. In this case, the database created would start ballooning into infinity, roughly growing exponentially starting at about 50th picture in the database and quickly taking on sizes of hundreds of gigabytes. Since querying for keys of the database would provide incomplete data, it can be conjectured that the 64-bit shelves have some performance and persistence issues. This problem was solved by basically transitioning back to OS X Leopard, 10.5.

I have received mixed feedback from friends to whom I presented it (especially my roommate), and the general consensus was, that while it is an interesting technology trick for combing large database pictures, this method of selection might not be entirely appropriate. Definitely, it is useful for certain situations (i.e. picking images of a single, focused, coloristically characteristic object or picking images of a general landscape), but it will fail for subtler choices. In general, it should be perhaps an assistance algorithm in more complicated databases, or have a very large database base, to allow for a proper functioning. It was also kind of fulfilling to see, that a photo of an orange taken with a camera phone and run through the program actually yielded images of oranges! There is hope for Computer Vision (smile)

  • No labels