For this project, the main goal was to read in csv files with data corresponding to headers, and to categorize and manipulate them. We had a file called data.py that read in the file and stored info into lists, dictionaries, and matrices. Then we made some useful accessor methods to get that information. In the analysis file, we had a few methods, such as one to calculate the mean of one or more matrix columns, and one to normalize the columns separately. The end result was a bunch of printed info in the terminal that tells us about the data from a given file.
*Work in progress, obviously*
Task 1 was to create the algorithm for reading the file in, and to then make the accessor methods.
I read in the file by using the csv package's reader. I split the text on commas and accounted for ignoring potential comments in the csv files. Then I looped over all of the lines in it (which we will call rows). If it's the first row, it's a header, if it's the second, it's the types, so I checked for those two indices first in the loop. I used variable to keep track of which column I was on for various parts of the whole file's loop, which was useful when building dictionaries. In a big else statement that now knows we have a regular data row if it wasn't a header or type, I also needed to tell the program to check if the data was numeric or not. I did this by trying to convert the string to a float and catching the exception. If we can convert it, it must be numeric, otherwise, it isn't. After doing all this, I made the accessor methods, which were mostly a matter of returning lists that we'd already built, or else manipulating them a bit. The one that gave me trouble was get data, as I was trying to fetch the columns separately then combine those matrices, and I got help from Stephanie in the end, who told me to make a 0 matrix of the return matrix size first, then to fill it in. This was much easier.
Learned: Some file-reading strategies, some numpy stuff...
Thanks to: Melody, Stephanie