road-to-software-engineering

Mostly fun stuff


Project maintained by theo-pnv Hosted on GitHub Pages — Theme by mattgraham

Hash Tables

Definition: An improvement over Direct Access Tables (big arrays, e.g. storing all 10 digits phone numbers in an array). Use a hash function to convert a key to a smaller number and use the smaller number as index in a table.

Complexity will be O(1) on average for all operations, but as the result of managing collisions, the hash table may have been filled in such a way that operations are O(n).

How to choose a hash function?

It should:

How to handle collisions?

Chances of a collision with a large table: Very likely (birthday pparadox: with 23 persons, probability of two persons having the same birthday == 50%).

  1. Separate Chaining

    Each cell points to a linked list of records that have the same hash.

    Pros:

    • Simple to implement
    • Hash table never fills up

    Cons:

    • Wastage of space (some cells aren’t used)
    • Must use extra space for links
    • In the worst case the search can become O(n).
  2. Open Adressing

    To insert a new item we keep probing (fr: “Sonder”) until an empty slot is reached.

    Different probing functions:

    • Linear probing: if h[x] full then try h[x] + 1. Problem: Clustering, many consecutive elements form groups and search is expensive. Advantage: Good caching performance, easy to compute.
    • Quadratic probing: if h[x] full then try h[x] + 1 * 1 and then h[x] + 2 * 2.
    • Double hashing: Poor cache performance, more computation time. Less clustering.

Resources