- 1 Introduction
- 2 Theory
- 2.1 Reading and preprocessing an image
- 2.2 Thresholding into binary image
- 2.3 Segmentation
- 2.4 Connected components
- 2.5 Elimination of false colonies
- 2.6 Calculating the values
- 2.7 Flattening and output
- 3 Methods
- 4 Results
- 5 Discussion
In this assignment, a program is created that allows the user to analyze the images of a Petri dish with bacteria colonies growing on it. The program outputs an original image augmented with the detected, numbered colonies and a comma separated value file with properties of the colonies (that include location of the colony, perimeter and area, and the probability of the region being defacto a colony).
The program bases on numerous assumption, including but not limited to the fact, that most of colonies will be shaped like circles, the colonies are not growing together and small structures might be specks or left overs from processing. To actually complete this assignment we needed to implement functions that let us manipulate the image histogram, detect connected pixels and determine their properties.
Detecting a colony of bacteria can be broken down into a multitude of steps, that are as follows:
Reading and preprocessing an image
In this step the image is read in and preprocessed by extracting the green channel out of the RGB picture. Green channel, due to the fact of it being affine to red in the formation of yellow. As colonies are yellow and the background is red, the difference is stored mostly in green channel. The green channel image gives us a background to work with in the next step.
Thresholding into binary image
In this step, we implement thresholding to find a value at which the pixels in the image are most probably representing colonies of bacteria. This can be done twofold - by a dumb, nearly hardcoded thresholding (based on semi-handpicking the cut off value) or implementing a smarter, automatized Otsu thresholding. (extension, partially)
In standard thresholding we rely on some primary trials done with the picture beforehand. Since we will be processing the same type of images over and over again, we might fine tune it in a very specific manner. in this case, by empirical check, we find that the bacterias tend to start about 70 degrees of highlight value higher than the minimal highlight value in the picture. We implement the procedure to mark all the points below that value as black (background) and mark the presumed bacteria with white.
I recommend to refer to an in-depth Otsu's definition in this Wikipedia article.
Through use of closing and opening (a compound functions of binary dilation and erosion - which as name implies, cause respectively the colonies on the binary image to grow and get smaller), we can achieve smoother picture and possibly remove some specks as well as the very thin edge of the Petri dish.
Connected components is a technique that allows us to detect structures within the image that are separated from another structures. In our case these will be colonies and edges. By scanning for connection with other pixels (thus connected) it splits the image into shaded components, what creates a mask for colonies that can be used for later calculations.
Elimination of false colonies
Since the edges of the Petri dish have similar colors to the colonies (as well as the reflections) it is crucial to remove them from the mask before we try to estimate the real number and size of the colonies. These are mostly based on the area of the colonies detected. The following can be done in twofold way:
Removal of outliers
After calculating the mean and standard deviation of the set of what we think are colonies, we might assume that items further away from the mean (i.e. 2x stdev away) are probably not of interest for us, since they are too big to be colonies (i.e. glass reflection) or they might be some inherent noise in the image. Thus, we calculate the values and remove the outliers from the mask.
Removal of non-circles
Most of the colonies was noted as circles (or vaguely resembling them). The mathematical equation for a circumference of the circle is 2pir while for the are is pir^2. The ratio between these two can provide us essential information about whether we are dealing with circle or not. Establishing thresholds, we can fairly safely try to remove the specks and glass reflections from the detection.
Calculating the values
After recreating the connected components upon removal of certain colonies from the original mask, a set of values (of locations, areas, etc) is derived from moments of the colonies and other properties. These will be later presented together with the picture.
Flattening and output
Last step is putting all the data on the output image and preparing the CSV file for printout.
The morph.py was implemented in Python, using mostly the numpy and CV libraries. Numpy was the workhorse this time, allowing to decrease the amount of work necessary to prepare and analyze the images significantly. A number of built-ins also decreased writing time for the application, as well as execution time - for example, matrix.where(value) was used to select masses from the mask that were supposed to be removed at least a magnitude faster than the next best option. (image scan for any pixels with sought mass)
Morph.py contains a number of wrapper methods that make swapping between cv and numpy matrices easier and pretty much abstracts them all into a single image object that is being passed around.
At the end, we keep on swapping between two (three) images - one of them is a green channel of image, thresholded and binarized (called mask), another one (called npImage) is the mask that was worked on using connected components. While two images are pretty much the same, the npImage contains descriptions and properties of all the colonies in the image, what is successively used. After extraction algorithms (to detect non-colonies) are used to remove colonies from the mask, the components checks are run again against the clear mask to check how many colonies with all properties are available (to check if we haven't done a mistake by any chance). Also this avoids running into segfaults or multiple removals of colonies.
After that, the colors in the image are randomized to obtain an eye pleasing color scheme for the final image. Later the mask is flattened into the original color image, labeled and provided to the user together with the CSV file.
Majority of methods was hand-implemented to get a better grasp on their construction principles, but the Otsu's thresholding was simply lifted from CV (it's built-in as a part of the CVs thresholding library). While actual implementation is possible, since there exists a fast, down-to-C implementation already in the library I use, I saw no particular point in reimplementing it. As a matter of fact, the method is now proof to changes in exposure or brightness of the picture, as the colonies are detected based on their difference to the background, not their given color.
Following is a series of animations that show the progress of mask, from the green channel to the final mask that was later used to produce the final color map on the picture.
The following images are examples of work with their properties calculated (i.e. false positives and accuracy rate).
In the first picture, it's clear that the morph.py handled the edge extremely well - there is not a single stray edge left for us to see. At the same time, while the colonies are rather medium-sized, the smallest colonies end up being ignored in favor of seeing the large ones. We detected 44 colonies, while there is a defacto of 55 of them. This, the accuracy rate is 80%, while FP rate is 0.
In the second picture we see the situation similar to the situation from the first picture. There is a large number of medium-sized colonies that are detected properly, a tiny one is skipped and there is not a single mismatch. However, noticeably, one larger colony is skipped. I can actually understand why it happens, as the colony is barely visible even to human eye and basically melts with the glass reflection, which is righteously removed from the picture. With a 90% accuracy and 0% positives it's the best result in the whole lab.
With the third picture, we get most of the colonies selected properly. However, Otsu's thresholding causes two reflections enter the image, with one melted with the colony. This leaves the ugly, noncircular speck at the bottom of the screen. The top ones are not removed either. With 91% accuracy it's not bad, but the 5% error rate sounds bad.
With the fourth picture, we get the example of a fail on the images with high density. Accuracy is about 10%, at least there are no false positives. The issue stems from the large number of colonies that grow together during dilation and are removed as one, big chunk not fitting the circular description. At the same time, it'd be possible to train the system to skip this sort of issue if one requires so.
This picture reassures that there really is issue with small colonies - they are simply removed by outliers and as not fitting circle descrption. Since a lot of colonies is growing together here, and many of them are spread well apart, the system does not grow into one huge blob for removal - instead it does a decent job at detecting them. (and adding glass reflection as a blob). With 45 real colonies detected, the accuracy rate is around 60%, while the error rate is 2%.
With the colonies nicely spread around and relatively large, it is an excellent example of recognition. With a hundred percent accuracy for the colonies it's unprecedented, but there is a large, 10% error rate.
The final image is a mix of medium and small colonies, evenly spread, that the algorithm does a good job with. There is no false positives, but with 55 colonies detected, the accuracy is just 81%.
My algorithm tended to be doing great whenever the colonies would not grow too close to each other. Circularity identification has done a stupendous job at filtering out the edges from picture most of the time, as well as erosion and dilation. However, in certain situations the outlier removal would cause large blobs of data to disappear, making the accuracy rate very low.
While we got very little false positives, one could claim that it is only due to a very particular setting that we well tweaked for, that might be true.
I noticed that these customized wrappers speed up debugging and processing times significantly, although sometimes for the clarity, the same operation is carried out many times, while it could be simply read off cache (i.e. the map of objects is calculated at least twice as many times as it should be at max). These are however fairly linear operations (timewise), so we do not need to preoccupy ourselves with trying to simplify them way at the cost of code clarity - we are not running in a realtime environment, at max with a dozen of pictures to analyze - a savings of 5 seconds on picture seem to not be worth it on such a scale.
As for additional growth, I would definitely seek harcording the idea of a petri dish into the recognition app - by this, we could remove all the regions that lie on the edge of the thresholded Petri dish, making the mask 100% clean of the dish in most of the cases. This would allow us to skip the circularity check in most of the cases (as we would get really fit picture of mostly colonies).
As for extension, I just implemented Otsu's thresholding (using OpenCV) Using my area and perimeter metrics, one could implement a more interesting primitive shape recognition, but that is definitely not something I would have done for this
All in all, I think that this project was a huge success - I learnt a lot and I was fairly stupefied when I understood, that my system recognizes successfully some of really convoluted blobs of bacteria. Makes you feel proud