For this project, the main goal was to read in csv files with data corresponding to headers, and to categorize and manipulate them. We had a file called data.py that read in the file and stored info into lists, dictionaries, and matrices. Then we made some useful accessor methods to get that information. In the analysis file, we had a few methods, such as one to calculate the mean of one or more matrix columns, and one to normalize the columns separately. The end result was a bunch of printed info in the terminal that tells us about the data from a given file.
*Work in progress, obviously*
Task 1 was to create the algorithm for reading the file in, and to then make the accessor methods.
I read in the file by using the csv package's reader. I split the text on commas and accounted for ignoring potential comments in the csv files. Then I looped over all of the lines in it (which we will call rows). If it's the first row, it's a header, if it's the second, it's the types, so I checked for those two indices first in the loop. I used variable to keep track of which column I was on for various parts of the whole file's loop, which was useful when building dictionaries. In a big else statement that now knows we have a regular data row if it wasn't a header or type, I also needed to tell the program to check if the data was numeric or not. I did this by trying to convert the string to a float and catching the exception. If we can convert it, it must be numeric, otherwise, it isn't. After doing all this, I made the accessor methods, which were mostly a matter of returning lists that we'd already built, or else manipulating them a bit. The one that gave me trouble was get data, as I was trying to fetch the columns separately then combine those matrices, and I got help from Stephanie in the end, who told me to make a 0 matrix of the return matrix size first, then to fill it in. This was much easier. Incidentally, here is a snippet of the start of my reading loop:
Note that "rU" here stands for "read universal" and prevents errors when reading potential differently-formatted csv files.
For task 2, we needed to create an analysis.py file, which was just a file full of methods. We had a method to get the range of data (eg min and max) of provided columns, a method to calculate their means, one for their standard deviation, and methods to normalize the columns both individually and relative to each other in a matrix fashion. Numpy had built-in methods for getting the mean, max, and standard deviation (although for this, I did need to specify 1 degree of freedom), so the first three functions were just a matter of getting the headers' info from our dictionary, using that to get the correct matrix column, and doing the operation to it.
On the other hand, I had a different approach to normalizing the columns. For the together one, I first made a 0 matrix similar to the get_data method from before. Then, I looped over all of the columns, and got their max and min matrices
Extensions: converting dates and converting more date formats...
Learned: Some file-reading strategies, some numpy stuff...Kept track of various...variables...throughout the loop.
Thanks to: Melody, Stephanie