Straight lines are prolific in images of man-made objects, such as buildings. Natural objects, like trees and grass, hardly ever have perfectly straight lines. By detecting only the straight lines in an image, therefore, one can remove distracting features, and be left with only the lines of the object of interest. These lines can be used to measure the features of a building by using a scale calculated from an easily-measurable part of the object.
Edges are found by detecting gradients. It is useful to process the image before this step to reduce variation caused by noise. It initially seems like a Gaussian filter is a poor choice because it also blurs the edges. A bilateral filter preserves edges by giving less weight to intensities that are unlike the origin pixel. When put into practice, it turns out that the Gaussian works better, mostly because images have many non-major edges that should be blurred rather than intensified.
A common form of edge detection is to correlate the image with Sobel filters. Any odd size can work, but a 3X3 is not only faster than a 5X5, but it also resulted in better line detection, as shown in the results section. This is probably because the sharper edges are the interesting ones, so a large filter just finds more distractions.
On a gradient image, edges have a strong center, with lower gradients on the sides. To extract only the center (true) edge, it is useful to perform non-maxima suppression by setting a gradient to zero if one of its neighbors in the gradient direction is stronger than it. After additionally binarizing the image with a certain threshold, is becomes a good representation of the edges in an image.
A Hough transformation can be used to find the equations of likely lines in the edge image. Lines can be defined by polar coordinates, where ? = xcos? + y sin? with ? and ? defining the vector pointing from the origin to a point on the line such that it intersects perpendicularly. This ? is the gradient angle of the line, so the gradient angle should predict ? with only a small degree of error. Given x, y, and ?, there is only one possible ?. In a discretized parameter space, therefore, every point on an edge can vote for the ?,? pairs with ? close to the gradient. As ? gets further away, it is less likely to be the true line, so it is sensible to give it fewer votes, as by using a Gaussian relationship.
The resulting accumulator space generally has points with high votes that have a neighborhood of decreasing votes that are due to the uncertainty of the best line to fit an edge. Whenever one of these points is selected as a real line, the nearby candidates should be removed with it. This poses the question of which points should count as neighbors. A good way to think of this is that anything 'downhill' from it is not wanted. Any local maxima should be preserved, though permitting a neighbor to have a few extra votes can be beneficial. In order to allow the neighbor growing algorithm to work, the parameter space must be Gaussian filtered.
The Hough transformation finds lines, but line segments are the desired result. One way to find the endpoints is to step along a line in the edge image, and select as endpoints all pixels that are on an edge, but are adjacent along the line to a non-edge pixel. Unfortunately, the edges do not likely align perfectly, so this is a good way to get a large number of extremely short segments. This problem can be alleviated by first dilating the edge image. This can cause more problems if any section of the image has a large number of closely concentrated edges, such as with bricks, since dilating can create a region with most pixels marked as edges. An additional way to join short segments is to identify lengths along the line (for example: 524 , 4 , 35 , 5 , 2, 5, 96, 69, 2, 23, 1, 62, 1, 3, 3, 11, 1), and unify any sequence of three numbers in which the center number is very small (like 69, 2, 23).
This program was implemented using numpy arrays with python.
Computing the gradients was made more efficient by correlating with two 1-D filters instead of one 3-D filter.
Non-maximimal gradients are suppressed by rounding the gradient angle to the nearest 1/4 radian, and looking at the two appropriate 8-neighbors to each pixel.
The parameter space is a two-dimensional numpy array with a height of twice the diagonal of the picture (allowing for negative and positive rhos), and a width of 2pi (from positive to negative pi). Only two quadrants of the space are used, so that it would not be a problem to convert all of the theta's to be between 0 and pi.
Finding neighbors is achieved by looking at the 8-neighbors of a pixel, adding all locations that have a value of less or only slightly more than it to a stack, and doing the same for every location in the stack. This is time consuming to find all of the neighbors for the first few lines as it effectively thresholds the image.
When finding line segments, I traverse the line (in either the x or y direction, depending on whether the slope is greater or less than 1), starting with a boolean online set to false. Whenever the next pixel is online, a length is incremented. If it is not online, then online is switched, the length is added to a list, and then the variable is set to 1. The list is condensed, and then endpoints are found by counting along it.
Bilateral vs Gaussian Filtering
The bilateral filter finds the brick edges much more than the Gaussian filter does, making it much sloppier in detecting endpoints.
3X3 vs 5X5 Sobel filters
All else was the same (using Gaussian filters), except that I doubled the threshold for gradients on the 5X5 to account for its greater magnitudes.
Equal vs Gaussian Voting
In the first image, points voted for lines with 5 different thetas in the parameter space equally. In the second, they voted 4 times for the center, 2 for the adjacent, and 1 for the distant ones. In maximizing parameters for the two systems, the minimum number of votes to count a line for equal voting is half as much as Gaussian(8 and 17). Also, the amount by which a neighbor is permitted to increase is also one-half (4,7). (though these may not be the absolute best parameters)
Smoothing with a Gaussian, using a 3x3 Sobel filter, and implementing Gaussian voting all help to make the Hough transformation more effective in finding straight lines. Many lines are detected, but areas of low contrast between the columns and the white building are not always detected. Most of the line segments do indeed only cover their edges. Many windows, however, still are not completely outlined, or the vertical lines extend slightly beyond the horizontal, and vice versa. In architecture, rectangles are nearly as common as lines, so it could increase the functionality of the program if it were to search for corner locations, and adjust the two endpoints to coincide.
Finding lines in an image can be useful in measuring high points on a building. On this image, however, not all points lie in the same plane, so no single conversion factor can be used perfectly for the entire image. The factor I chose was a compromise between the window plane and the column plane. Taking a picture with a telephoto lens could relieve this problem.
If the goal is to obtain a line drawing of the building, then the shadow is a distracting factor. There are methods of removing shadows from images, so running the image through such a function could solve this problem.