Bloom Filter

Is a probabilistic datastructure that used probability to determine if a particular item in the set.

Under the hood, it's just a bit array with 0 and 1 with a fixed length, for example in here is 18.

Pasted image 20230628141501.png

We have 2 operation: Insert and Look up

Also built-in supports for Redis, Postgres, …

Use cases

  • Checking for existing in database
  • Deduplication

Insert

We perform insert with constant space and time complexity:

  1. Hash the input value
  2. Mod the result by the length of the array
  3. Set corresponding bit to 1

Pasted image 20230628141714.png

Lookup

We perform lookup at a constant space and time as well

  1. Hash the input value
  2. Mod the result by the length of the array
  3. Check if the corresponding bit is 0 or 1

If it's 0 that means it's not there.

Pasted image 20230628141932.png

In this example, the bit at the 10_th_ index is 0, which indicates that the given string isn’t present in the Bloom filter.