Skip to end of metadata
Go to start of metadata


This project identifies bacteria colonies from images of bacteria plated in agar on a petri dish placed upside-down on a red background. In order to enable rigorous analysis of the bacteria colonies, it is necessary to compute such characteristics as their area. The most important step is to determine which pixels together make up distinct colonies. Once this is accomplished, it is easy to calculate characteristics (like shape and size) of interest to biologists.


To segment the image into colonies, images must be sent through a series of operations.

  • Binarize
    The first step in processing the image is to binarize it at a threshold that brings colonies to the foreground, and everything else to the background. For these images, a value of 155 on the green channel worked well. Had their been more variation in the images, then an automatic thresholding, such as the Otsu method, would have been required.
  • Open
    Several colonies in each image grew very close to one another, sometimes even merging. The ones that only barely touch after binarizing can be separated through the morphological operation of opening – first eroding, and then dilating the image. Through some testing, it seemed that a general 3x3 box structuring element worked as well as any other, such as a larger one that ignores corner elements. More severe erosion separates more colonies, but at the expense of the colonies losing their integrity.
  • Find Connected Components
    In order to consider colonies individually, it is necessary to group connected components. Region-growing is used to find the components. The classical row-by-row method would work as well.
  • Calculate Properties
    This is the part where useful calculations are made. Finding different moments of each component result in finding the area, centroid, and orientation. The orientation is used to find the least and greatest second moments, whose quotient is a measure of circularity.
  • Reduce Rim Noise
    Thresholding and opening the image do not succeed on their own in removing the rim from the mask. Using the properties just calculated, however, it is possible to remove more of it. Bacteria colonies are generally circular, and many edge segments are linear. Regions with circularity measures under .1 can be removed without any intrusion on the actual colonies. There are also very small bits of rim that are fairly circular. To remove them, regions with areas that are much smaller than the majority of the regions are also removed. A threshold that works fairly effectively is the point geometrically midway between the 25th percentile and the minimum value.
  • Presentation
    To make the bacteria stand out to the user, they are colored uniformly, so that the bacteria are divided into four equal groups based on size, and colored the same way. Since it was convenient and interesting, the axes of least second moment are also displayed on each colony. A CSV text file is produced with the attributes of every region, with each one identified by a number. This number is printed at the centroid of each region for reference.


  • Opening
    To make this method as general as possible, methods for eroding and dilating accept any size of structuring element, as well as an origin for it. To speed up these slow methods, slicing into numpy arrays and then doing operations on them was used as much as possible in place of many nested loops. For dilation, in order to "stamp" the structuring element repeatedly, it only adds 1 to each location, and then at the end, all pixels with values greater than zero are boosted to 255. Before this, though, it is interesting to print out the image. The edges are darker than the centers, and it looks potentially useful for detection (not that it is a great way).
  • Find Connected Components
    Region-growing can be implemented recursively, but it is a very greedy algorithm. Instead, stacks are useful to implement the algorithm without using an inordinate amount of memory. This method produces a list of component lists (each listing the pixel coordinates as tuples).
  • Calculate Properties
    It is trivial to compute area and centroid from a list of pixel coordinates. Finding the orientation is more difficult. In the end, it seemed easiest to compute the moments for 4 orientations, and then use python's min and max functions to identify the least and greatest moments.
  • Reduce Rim Noise
    To handle thresholding both moments and areas in the same method, it accepts an index into the list of properties generated by the previous method. It also takes as a parameter the list of pixel locations for all of the components so that it can remove extraneous regions from both lists simultaneously, thus preserving their parallelity.


accuracy = #correctly identified/#of colonies
false positive rate = #false regions/#total regions

Accuracy: 100% False Positive Rate: 5.%

Accuracy: 29/37 => 78.4% (counting merged colonies as counted once, but being present twice) False Positive Rate: 26%

Accuracy: 87% False Positive Rate: 21%

Accuracy: 85% False Positive Rate: 23%

Accuracy: too low to count – 22 is counting for a large number of colonies, but, admittedly, I'm not sure that humans can count it very well, either False Positive Rate: 0%

Accuracy: >9%(it is difficult to count the merged regions) False Positive Rate: 1%

Accuracy: 50% False Positive Rate: 2%

Accuracy: 83% False Positive Rate: 23%

Accuracy: 85% False Positive Rate: 3%


Many of the accuracy rates are low due to the colonies growing together. It would be interesting to see if it would be possible to, when a perimeter dips in, to have it connect down to the dip in the opposite side, severing the merge. Or maybe I am looking at it all wrong, and when two colonies grow together, they are really just one.
In the last image, number 13 is marked as a colony, but only part of it really is. An alternate method of segmentation, perhaps using edge detection, could improve its representation.