Skip to end of metadata
Go to start of metadata

Abstract

This project tasked us with implementing a priority queue using heaps. We would then use the data structure to identify the most common words from our previous Reddit files. A priority queue is a queue where each entry is given a "priority" which determines the order. A heap is a structure where the parent node is always more or less than its child. I implemented mine by creating an array that would percolate up or down based on the addition or removal of entries. The swapping of two entries would be determined by a priority value. This is so we can attribute the most common word with the highest value.

Solution

To properly achieve a heap, I needed to have a percolate up method and a percolate down method. These would go hand in hand with the remove method and add method. A heap is can be either a min-heap or a max-heap. This project required a max-heap because we would want the most occurring words to go to the top of the queue. Therefore, any new entry would be added to the bottom of the array and then would be swapped using a swap method which would move the higher priority entry towards the top of the queue. The percolate down method did relatively the same but towards the bottom of the queue.

To extract the words of interest, I would have to run each word count file through the Binary Search Tree word counter. I would then have a Binary Search Tree which has a get count method. I could retrieve a count of a word (key) and divide that by the total words to get the frequency.

These trends are interesting, primarily because of the lack of use with all the programming languages. I expected the fall of HTML, as it becomes less prevalent. However, I didn't expect words like python or java to decrease too. It should be said that there could be issues in the data from people using words such a java or python to mean coffee or the snake.

Conclusion

Priority queues are a simple way to find the greatest value or lowest value of any set of information. The implementation is far simpler than Binary Search Trees primarily because of the array-based structure. They are definitely more useful than HashMaps for any organized data set.

Acknowledgment

  • Deka Popov
Labels