For this project, we created a way for the user to perform primary component analysis on the data. That is to say, we allow them to choose which header's columns (2 or more) to analyze, and we create corresponding eigenvectors which we project the data onto and let the user view. The first two or three eigenvectors become the new bases/axes of the data, and we save each analysis so the user can later plot any of them from the list of choices. My program lets the user make any number of PCA from an open file, and the user can then open a window that lets them plot the PCA or look at its eigenvectors and other info. I also allow the user to name their analysis if they so choose (the default name is a list of chosen headers).
The first task was to make a window where the user can choose which headers to use in the PCA. I made a class called PCASelection that has a listbox of all the headers, which I get from the opened data, and then the user can select multiple headers to use before hitting ok. I get all of the selected indices and make a new PCAData object with the appropriate headers based on the choice. Then I store this data in a list in the display class, which I will use to access info about individual analyses later. Each PCAData instance is made using the analysis class's method called pca. In this method, I follow the structure of the provided code to get the data projected onto its first three eigenvectors, which I store in the PCAData instance I return. When it's time to plot this on the canvas, I normalize it again in buildPCAPoints, which I call from the apply method of the PCAView class window. The task also calls for the user to be able to delete an analysis. To do this, I just added a delete button, and if the user presses it with one or more highlighted items, I call the listbox method delete() on the index/indices, and delete the corresponding data from the list of PCA, too.
The second task is to enable the user to view the projected data. I call the buildPCAPoints method to plot the data. In retrospect, I could have used one buildpoints method for any kind of plotting, but I made this method anyways. The only difference is that when a normal data plot is done, I change the opened data field (eg the data we are currently looking at) to be the original data. This allows us to still store the original data in case we want to plot everything normally. When PCAData is plotted, the the opened data turns into the selected data from the list of PCA.
Below are pictures of the screen to choose which headers to put into the PCA and the screen to choose which PCA to plot.
The third task was to make a dialog window that shows all of the eigenvectors, eigenvalues, and energy of the selected PCA data. In my case, the user can press a button (shown above) to view the eigenvectors of a selected PCA. This opens a new window, in which I have created a table of labels. I do this in the class ShowEigenvectors, which makes the table in the body method. Here, I get the PCA headers (PCA00,PCA01,etc), the selected header names (the ones from the original data), the eigenvectors of the PCA, and the eigenvalues. I also build a list of energies, which I do by looping through each eigenvalue and dividing it by the sum of the eigenvalues. Then, I make convenient variables with the number of rows we want (the amt of eigenvalues, for instance, plus an extra row for column names), the amount of columns (the amount of selected headers plus 3 columns for the eigenvector name, eigenvalue, and energy), the amount of places to round numbers to, and the width of each label. Then I loop through the rows and columns, and create labels with different information in them depending on which row/col/ combination of both the loop is on. The dialog window in question is pictured below:
Notice how I make every other row have a light blue background. I did this by checking if it was an even or odd row (if it is an even row, the remainder of rowNum%2 is 0).
The picture below is the picture for task 4 I have to prove that my program works. I've plotted the premin, premax, salmin, salmax, minairtemp, maxairtemp, minsst, maxsst, minsoilmoist, maxsoilmoist, and runoffnew of the Australia data set.
Task 5 was to...name my program. My name is:
Interesting Menus Show Off Beautiful Except Hellishly Integrated New Data aka "IMSOBEHIND"
For an extension, like I mentioned, I gave the user the ability to name their PCA. In a screenshot above, you can see this part of the interface. The user can enter a name into a tk.Entry field, and upon pressing ok, if a name was given, it will be stored along with the data in the list of them. Then, when we list the PCA in the window to choose them, they will be listed under their name if the list entry associated with them has another index for a name. Even if there are duplicate names, it is fine because the name is only there for the user experience, and the program doesn't depend on it at all, instead looking at the underlying data.
I learned a lot from this project. I had to reorganize a lot about my project, such as how to store the opened data and to differentiate between what kind of data it is. I also rewrote my get_numeric_headers method in data, which had the problem of not being able to sort number strings with sorted(). This means I made an optional argument that, if true, indicated to sort it differently, based on the values of the dictionaries (since they were numerical and in order in relation to the PCA column names anyways). The biggest problem I had was not normalizing the data again for the canvas, because I didn't understand that the first normalization was only for getting the data to be in terms of the eigenvectors. Stephanie helped me a lot with this problem.
Very Special Thank You: Stephanie
Routinely Grateful To: Melody