Skip to end of metadata
Go to start of metadata
This project was focused on testing Hashtables versus Binary Search Trees in sorting word frequencies from Reddit posts. It built upon last week's project as we had already done this using Binary Search Trees. This time we had to implement a Hashtable. A Hashtable is an array that has KeyValuePair placement decided by a hash code. There are various different methods of hashing, an algorithm that creates an index from a key, that can be used to make the data structure more efficient. I was able to build upon last week's Binary Search Tree by moving away from the node structure and introducing a hash method.


A Hashtable requires a Hash algorithm. This is a method that takes in a key, such as the word book, and runs it through Java's String Hash Method to produce a numerical number. It is then modular by the size of the array to have a "hopefully' unique index. However, this is where another problem arises, as its possible and sometimes common for repeated indices to appear. These are called collisions. I was able to solve this in a linear way that would try the following index until an open one appeared. I would also have to ensure that there is a reasonable size of the array, for if the array is too small it would take ages to find an open index.



For my program, I found that both the Hashtable and the Binary Search Tree worked at relatively the same efficiency. There was a slight advantage given to the Hashtable. I attribute this to my method of dealing with collisions. I took on one of the fairly less efficient collision methods which was linear probing. If I had done something such as double hashing I would likely see a more noticeable difference in efficiency between the Hashtable and the Binary Search Tree. This further emphasizes the customization that exists when matching a data structure to a problem in order to find the most efficient solution.