Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


In order to exploit the priority queue, or max-heap, I created FindLongWords class, which finds the longest word of each file, by using FindCommonWords class as the base.  I wrote KVStringComparator in order to sort the heap by the length of each word, rather than the frequency of each word.  At first, I expected to observe long and complicated words.  However, the result turned out to be vastly different from my expectation.  The longest words were not complicated like 'pneumonoultramicroscopicsilicovolcanoconiosis'.  They were rather repetition of a simple word.  Below is the graph which shows the length of the longest word of each comment file.  Please type in a number of text files to read on the command line argument.

The longest length was 10000 character-long.  I did not include the words because they were too long for the graph.  From this analysis, I learned that there are people who decide to type 'no' 3000 times, 'haha' 1000 times, or 'lol' nearly 3000 times.  Some outputs were merely a combination of random numbers that were 5000 character-long.  Online comments can be indeed interesting.