Pages Home Zena Abulhab CS251 - Data Analysis and Visualization Zena's CS251 Project 5

For this project, we integrated linear regression into our program. The linear regression line, which fits the data points, is created with a formula that considers the vertical distance from the points. We needed to read in two columns of data, plot them, and create such a line, which would also have to move around like normal when the user interacts with the space. I have a command in the top menu that lets the user make a linear regression. We also had to create a function in analysis.py that performs a multiple linear regression, which takes in two independent variables and one dependent one, and gives us information about the fit and distribution. My program also has a screenshot command in the menu that saves a picture of the current screen, and labels for the axes both for normal plotting and linear regression.

For the first task, I made a method that updates the linear regression line. It is called if there is a line present, and then it, like other update methods, multiplies the points of the current object by the vtm, effectively scaling it to the screen. The second task just asks to test this function to see if the line moves correctly. The result of testing it on data-simple.csv is below. We also needed to make sure that canceling the regression didn't mess anything up, so instead of what I did before, with taking action after the dialog box closes, to taking action if ok is pressed.
This is the window that pops up when choosing what data to use for the linear regression:

Task 3 was to create the linear_regression function in analysis.py, which returns the information about a header's column in relation to two independent variable columns. I followed the provided information to create the method. We calculate such things as standard error, the r^2 value, and the degrees of freedom by manipulating the columns of data. Below is the result of running it on the three provided files:

The last task was to find a dataset online with a strong linear relationship and test our linear regression and our multiple regression function on it. I found data on work-related info, including hours worked part time, debt, and women's and men's wages. Here is a linear regression showing the relationship between debt and part time. It appears that the more part time hours are worked, the more debt accrued, although there is a cluster of debt with almost no part time hours worked near the bottom left, too. Overall, the line shows that there is a fairly even distribution in the data. (Amount of debt is the x axis, and part time hours worked is the y.)

Below is the plot of men's wages vs women's wages vs. part time hours worked. The x axis is men's wages, the y axis is women's wages, and the z axis is part time hours. The rate of change for the x axis is higher than on the y, indicating that men's wages are higher in relation to women's at most, if not every, point, even as the amount of part time worked increases with the pay.

The result of running the multiple regression method on the data is shown below. The r^2 is .79, which means we can explain 79% of the variance with this line. This population sample is fairly small, so I did not expect very high accuracy. On each of the three axes, we also have t values, which show how close the sample would probably be compared to the hypothesized value of a bigger population. Since x's t is .086, men's wages would likely look around the the same, while women's were higher than expected and the part time worked was less than expected.

For the extension to take a screenshot, I created a new menu command and made a field that keeps count of the pictures taken within the session by being incremented on each screenshot call. This means that you will overwrite old pictures with the same name (eg "screenshot01") if you start a new session and begin taking pictures. In handleScreenshot, I create a postscript on the canvas and save it in the same directory as the program.
I also worked on improving my label extension, because I noticed through doing linear regression that switching from 2 to 3 then back to 2 axes plotted kept the z axis label there. To fix this, I went to buildPoints and made not the labelList's z axis string blank, but the actual object on the canvas blank by using canvas.itemconfigure().

This project was helpful for gaining more experience managing methods and their results within tkinter; I needed to fix many problems with erasing and switching between normal plotting/regression plotting along the way. I learned a bit about how to interpret linear regression lines, as well, which is useful.

Thank you: Melody

Labels