For this project, we learned how to make two classifiers, which are forms of machine learning. We had to make both a Naive Bayes Classifier and a KNN (K Nearest Neighbors) classifier. We trained them using data with given classes. We created a file to test both types; I allowed the user to specify which kind to do at command line. With my implementation, KNN takes a lot longer than naive bayes, but KNN also is a lot more accurate. I also tested it on testing and training datasets I created.
Task 1 was to make a method that would create a confusion matrix method. For this, I read in both the correct categories and the ones the classifier got, and I have a for loop that loops through the amount of points and compares the classes at the given index to each other and adds 1 to the area in the matrix corresponding to the spaces those classes have in the confusion matrix. For example, if a point is in the first class, and its corresponding classifier point is in the third class, it will go in confusionMatrix[0,2].
Task 2 was to create the file that tests the classifiers, which I called classifyTest.py. I take input for which training and test classes to use, what to call the output file (which is the test file plus the column for classifier result classes), which classifier to use, and an optional pair of separate header files.
Task 3 was to run the code on the activity set. The following pictures are my results:
Using Naive Bayes on the huge datasetactivity set:
Using KNN on the huge activity dataset:
Using Naive Bayes on huge dataset the activity set with PCA:
Using KNN on huge dataset the activity set with PCA:
Using the PCA data made the naive bayes classification more accurate while it made the KNN slightly worse. KNN seems to be more accurate according to the
For task 4, we had to
PCA plot picture (using the output of doing KNN on the activity set with PCA data): Image Added