What the provided program does: Does primary component analysis to create a pcadata, finds out how many dimensions are needed to represent most of the data, then does kmeans algorithm using the provided data. The confusion matrix shows how many points belong in each category, with the actual being the column and the result being the row of a given index
I made it have more max_iterations and a smaller min_change. I thought that the more iterations, the more times the mean would be checked for its proximity to each point, which would make it more centered. And lowering the min_change would mean the mean would have to be closer to a point before ceasing to move toward it. But in the end, changing these did not do much.
When plotting kmeans data, the assumption is that the user wants to use the default cluster ids to determine color, but if they specify a color column, that will override the default settings. Also, there is a checkbox for whether to have a gradient of yellow to blue or to have distinct colors.