Summary:

The main purpose of this project was to incorporate the principal-component analysis (PCA) feature into my GUI.  Integrating PCA feature involved three things: 1) enable my GUI to execute PCA on a loaded data set, 2) show the eigenvectors and eigenvalues of a selected analysis for the user, and 3) display the projected data based on the analysis.  To do so, I needed to complete the following tasks: a) create a function in analysis class, which conducts PCA, b) create more child classes of Dialog that enable the user to select features for PCA, display the results of the analysis, and that give the user lists of columns to plot when projecting the first three eigenvectors, and finally c) project the PCA data based on the user's selection of columns.  The first task was straightforward as it simply required following the instructions of the lab.  The second one needed more planning of how to lay out the dialog window and a good amount of coding.  The last task turned out to be simple but still tricky; it involved slightly modifying buildPoints() function in display class.  After completing the tasks, my GUI now has PCA feature along with linear regression.  Please refer to the instructions below to use my updated GUI.


Instructions for PCA:

Follow the below instructions to conduct PCA on the data set, 'AustraliaCost.csv'.  I will explain how I implemented my extensions later.

  1. Before performing PCA on a data set, the user must first open and file and read it by pressing 'Control-o'.  The user then should select the loaded data set in the first list box of the right control frame and then click the button 'Perform PCA'.  If user do not follow these instructions, below error messages (Figure 1 and Figure 2) may show up.

     (Figure 1)  (Figure 2)

  2. If user reads in a data set and performs PCA, a dialog window with a list of original features should show up like in Figure 3.  The user should then select features for PCA analysis.

     (Figure 3)

  3. After selecting the features for PCA and then clicking 'ok' button, the user will now see another dialog window (Figure 4) in which one can name the analysis.  This is an extension.

     (Figure 4)

  4. After naming the analysis, the user now should see the analysis loaded in the second list box.  In my case, I named the analysis 'Australia_PCA'.  See Figure 5 for illustration.

     (Figure 5)

  5. Once PCA is executed, the user has a variety of options.  Whatever user decides to do, one must select the analysis in the second list box first.  The user can see the eigenvectors and eigenvalues of the analysis by clicking the button 'PCA Results'.  Figure 6 shows the result of running PCA with the following features: premin, premax, salmin, salmax, minairtemp, maxairtemp, minsst, maxsst, minsoilmoist, maxsoilmoist, and runoffnew.

     (Figure 6)

  6. The user can also project the first three eigenvectors on GUI by clicking the button 'Project PCA'.  When the button is clicked, a dialog window (Figure 7) will show up, in which the user can select which eigenvectors to project on which axes.  As an extension, I enabled the user to pick the columns to plot and select up to five columns (x, y, z, color, size).

     (Figure 7)

  7. After selecting the first three eigenvectors (or more), the user can then project the PCA data.  I will show the plot of AustraliaCost data using the first three eigenvectors in the next section.
  8. For an extension, I also enabled the user to select columns, intermixed from the original data and the PCA.  To do so, the user should simply click the button 'Intermix PCA'.  Then a dialog window (Figure 8), containing the original and PCA features will show up.  After making selections, the user can project the mixed data.

     (Figure 8)
  9. Lastly, the user can store the PCA analysis as CSV file by clicking the button 'Save PCA' and load it later by clicking the button 'Load PCA'.  When saving PCA, the name of the analysis will be used when creating CSV file.  Note that loading PCA will not project the data right away.  It simply re-loads the stored PCA into the listbox so that the user can see the results or project the data again.  When loading PCA (Figure 9), the user must select the stored analysis not the original data set.

     (Figure 9)

 

Required Images & Analysis:

Using the Australia Coast data set and computing the PCA on the following columns: premin, premax, salmin, salmax, minairtemp, maxairtemp, minsst, maxsst, minsoilmoist, maxsoilmoist, and runoffnew, I obtained the below plot (Figure 10) after projecting the first three eigenvectors.

 (Figure 10)

For my choice of data set, I chose the Iris data set and performed PCA using following features: sepallength, sepalwidth, petallength, and petalwidth.

I decided that the name of my program is 'BAD', which stands for 'Basic Analysis of Data'.

Extensions:






 

Acknowledgements:

I received help from Professor Taylor and Maxwell to figure out how to implement intermix extension and interpret its result.