Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »


The goal of this project was to find the most common words in a text document of reddit comments and find trends related to how many times a set of words appeared over 8 years of reddit comments.  To find the most common words, counts files that include each word and how often it appeared were found in Project 7 using a binary search tree and were read into a binary search tree again, then put into a priority queue.  The priority queue removes items in order of how many times the word appears and prints the list to the terminal.  In an extension, it reads the word-value pairs directly into a priority queue.  Then to find trends, it reads the word-value pairs into a binary search tree and the finds each word in a list and returns its frequency (how many times it appears divided by the total number of words in the document) and prints a list of how many times it appears to the terminal.  Then, a graph is made with the results. In extensions, it prints the results to files rather than printing to the terminal.    ***add results


1) The first task was to create a FindCommonWords task that will print the words in a word count file (from project 7) in order from the highest number they occur to the least number of times they occur.  It has fields for a word counter object (from project 7) and a priority queue heap that uses a comparator that takes in key value pairs that have a string (the word) and an integer (the count) and returns the difference between the counts to find which word occurs more times. 

I used the readWordCountFile of WordCounter from Project 7 to put all of the rd-count pairs into a binary search tree.  Then using the getPairs method, I made an arrayList of the word-count pairs.  Then I looped over every pair in the arrayList and add them to the priority queue heap.  


  • No labels