Skip to end of metadata
Go to start of metadata

Abstract

The purpose of this project was to create an efficient way to sort through and store millions of words from Reddit posts. Because of the magnitude of this project, the previous data structures we worked with would take dozens of minutes to process, which is why this program was built around the implementation of a binary search tree. A Binary Search Tree is a doubly-linked list, where each node can have two children. The children's placement is also determined by their relation to their parent node. Within this tree, we are able to store words with the number of times they occurred and search for them without having to iterate through an entire linked list.

Solutions

The Binary Search Tree is built similarly to a Linked List, in which it is comprised of linked nodes. The most challenging part was creating the put method that inserted KeyValuePair nodes and the get method which returned the value for a given key. The creation of both of these methods required recursion in which you would have two get methods and two put methods. My put method was the most complex where nested if statements would use the comparator to traverse the tree from the root and then insert the node. The get method worked similarly where it would search through the tree using recursion and a comparator until it would return the value from the node with the matching key.

 

The Analyze Function worked to use BufferedReader to create a string consisting of the input file. It would then split the string into an array comprised of each word in the string. From there it checks if the word already exists in the tree, updating it if it does and inserting a new node if it does not.

Printouts

Conclusion

This project was where we really dived into the efficiency of a data structure. If I were to use a doubly-linked list, it would have taken dozens of minutes to sort through these text files, instead of the 30 - 70 seconds that it took with the Binary Search Tree.

Acknowledgment

Thank you to Deka Popov for working next to me and helping me debug when I was stuck.

Labels