Implement Hash Data Structure in Python & Java

0
17
Hash Data Structure

A Hash Data Structure is a powerful and important method in computer science domain for storing and retrieving data in a constant time on average. It uses a hash algorithm to turn keys into array indices, allowing for quick access to associated values. This makes it perfect for creating dictionary, cache, database indexes and many more.

Unlike arrays or list data structures where searching might take linear time, hashing algorithm improves speed when working with big data sets. In this blog we will go over the principles of how a hash data structure works, including its components and how collisions are basically handled.

Most importantly you will learn how to build this structure from scratch using Python and Java, two of the most popular programming languages today.

What is a Hash Data Structure?

A Hash Data Structure is another type of data structure that stores data in key value pairs. It uses a special function which is called a hash function that convert each key into a unique index of an array where the respective value is stored. This allows for fast data retrieval, insertion, and deletion that typically terms in constant time O(1) on average.

The are some main components of hash data structure which includes:

  • Key: The unique identifier for a value like your name or ID.
  • Value: The information or data that is associated to the respective key.
  • Hash Function: This function basically converts the key into an index.
  • Collision Resolution: This is another way to handle situations where you need two keys to produce the same index.
Hash data structure

Implementation of Hash Data Structure in Java

In this section we will be talking about the hash data structure to perform in Java programming languages. Here is a code snippet below that is written in Java.

import java.util.*;

class HashNode<K, V> {
    K key;
    V value;
    HashNode<K, V> next;

    public HashNode(K key, V value) {
        this.key = key;
        this.value = value;
    }
}

class HashTable<K, V> {
    private int size;
    private LinkedList<HashNode<K, V>>[] table;

    public HashTable(int size) {
        this.size = size;
        table = new LinkedList[size];
        for (int i = 0; i < size; i++)
            table[i] = new LinkedList<>();
    }

    private int getHashIndex(K key) {
        return Math.abs(key.hashCode()) % size;
    }

    public void insert(K key, V value) {
        int index = getHashIndex(key);
        for (HashNode<K, V> node : table[index]) {
            if (node.key.equals(key)) {
                node.value = value;
                return;
            }
        }
        table[index].add(new HashNode<>(key, value));
    }

    public V get(K key) {
        int index = getHashIndex(key);
        for (HashNode<K, V> node : table[index]) {
            if (node.key.equals(key))
                return node.value;
        }
        return null;
    }

    public void remove(K key) {
        int index = getHashIndex(key);
        table[index].removeIf(node -> node.key.equals(key));
    }
}

Implementation of Hash Data Structure in Python

In this section we will be talking about the hash data structure to perform in Python programming languages. Here is a code snippet below that is written in python programming language.

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(size)]

    def _hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self._hash_function(key)
        for i, (k, v) in enumerate(self.table[index]):
            if k == key:
                self.table[index][i] = (key, value)
                return
        self.table[index].append((key, value))

    def get(self, key):
        index = self._hash_function(key)
        for (k, v) in self.table[index]:
            if k == key:
                return v
        return None

    def remove(self, key):
        index = self._hash_function(key)
        self.table[index] = [(k, v) for (k, v) in self.table[index] if k != key]

# Example usage
ht = HashTable(10)
ht.insert("apple", 5)
ht.insert("banana", 10)
print(ht.get("banana"))  # Output: 10
ht.remove("banana")
print(ht.get("banana"))  # Output: None

Conclusion

The Hash Data Structure is one of the most efficient way to manage key value pairs in programming language. By implementing this data structure from scratch in Python and Java, then you have deepened your understanding of how these systems work under the process. This basic knowledge is not only useful in interviews but also in real-world system design.

Frequently Asked Questions

What is a hash data structure used for?

A hash data structure is primarily used for storing data in key-value pairs, allowing quick access to values using unique keys. It’s widely used in databases, caches, language dictionaries, and real-time applications where fast data retrieval is essential.

How does a hash function work?

A hash function takes a key (such as a string or number) and converts it into an integer that represents an index in an array. The goal is to distribute keys uniformly across the array to avoid collisions and ensure efficient data access.

What is a collision in hashing, and how is it handled?

A collision occurs when two different keys hash to the same index in the array. Collisions are typically handled using techniques like chaining (storing multiple values in a list at the same index) or open addressing (probing for the next available slot in the array).

What is the difference between a HashMap and a Hashtable in Java?

In Java, a HashMap is non-synchronized and allows one null key and multiple null values. On the other hand a Hashtable is synchronized (thread-safe) but doesn’t allow any null keys or values. HashMap is generally preferred for non-threaded applications due to better performance.

What are hash data structures faster than arrays or linked lists for lookups?

Hash data structures offer constant time complexity (O(1)) for lookups on average, thanks to direct indexing via hash functions. In contrast, arrays and linked lists may require linear time (O(n)) to search for an element, especially when the dataset is large.