The goal of this project was to find the most common words in a text document of reddit comments and find trends related to how many times a set of words appeared over 8 years of reddit comments. To find the most common words, counts files that include each word and how often it appeared were found in Project 7 using a binary search tree and were read into a binary search tree again, then put into a priority queue. The priority queue removes items in order of how many times the word appears and prints the list to the terminal. In an extension, it reads the word-value pairs directly into a priority queue. Then to find trends, it reads the word-value pairs into a binary search tree and the finds each word in a list and returns its frequency (how many times it appears divided by the total number of words in the document) and prints a list of how many times it appears to the terminal. Then, a graph is made with the results. In extensions, it prints the results to files rather than printing to the terminal. ***add results
1) The first task was to create a FindCommonWords task that will print the words in a word count file (from project 7) in order from the highest number they occur to the least number of times they occur. It has fields for a word counter object (from project 7) and a priority queue heap that uses a comparator that takes in key value pairs that have a string (the word) and an integer (the count) and returns the difference between the counts to find which word occurs more times.
I used the readWordCountFile of WordCounter from Project 7 to put all of the rd-count pairs into a binary search tree. Then using the getPairs method, I made an arrayList of the word-count pairs. Then I looped over every pair in the arrayList and add them to the priority queue heap.