Class BloomFilter<T>

java.lang.Object
com.google.common.hash.BloomFilter<T>
Type Parameters:
T - the type of instances that the BloomFilter accepts
All Implemented Interfaces:
Predicate<T>, Serializable, Predicate<T>

public final class BloomFilter<T> extends Object implements Predicate<T>, Serializable
A Bloom filter for instances of T. A Bloom filter offers an approximate containment test with one-sided error: if it claims that an element is contained in it, this might be in error, but if it claims that an element is not contained in it, then this is definitely true.

If you are unfamiliar with Bloom filters, this nice tutorial may help you understand how they work.

The false positive probability (FPP) of a Bloom filter is defined as the probability that mightContain(Object) will erroneously return true for an object that has not actually been put in the BloomFilter.

Bloom filters are serializable. They also support a more compact serial representation via the writeTo(java.io.OutputStream) and readFrom(java.io.InputStream, com.google.common.hash.Funnel<? super T>) methods. Both serialized forms will continue to be supported by future versions of this library. However, serial forms generated by newer versions of the code may not be readable by older versions of the code (e.g., a serialized Bloom filter generated today may not be readable by a binary that was compiled 6 months ago).

As of Guava 23.0, this class is thread-safe and lock-free. It internally uses atomics and compare-and-swap to ensure correctness when multiple threads are used to access it.

Since:
11.0 (thread-safe since 23.0)
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private static class 
     
    (package private) static interface 
    A strategy to translate T instances, to numHashFunctions bit indexes.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    The bit set of the BloomFilter (not necessarily power of 2!)
    private final Funnel<? super T>
    The funnel to translate Ts to bytes
    private final int
    Number of hashes per element
    private static final long
     
    private final BloomFilter.Strategy
    The strategy we employ to map an element T to numHashFunctions bit indexes.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
    BloomFilter(BloomFilterStrategies.LockFreeBitArray bits, int numHashFunctions, Funnel<? super T> funnel, BloomFilter.Strategy strategy)
    Creates a BloomFilter.
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    apply(T input)
    Deprecated.
    Provided only to satisfy the Predicate interface; use mightContain(T) instead.
    long
    Returns an estimate for the total number of distinct elements that have been added to this Bloom filter.
    (package private) long
    Returns the number of bits in the underlying bit array.
    Creates a new BloomFilter that's a copy of this instance.
    static <T> BloomFilter<T>
    create(Funnel<? super T> funnel, int expectedInsertions)
    Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.
    static <T> BloomFilter<T>
    create(Funnel<? super T> funnel, int expectedInsertions, double fpp)
    Creates a BloomFilter with the expected number of insertions and expected false positive probability.
    static <T> BloomFilter<T>
    create(Funnel<? super T> funnel, long expectedInsertions)
    Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.
    static <T> BloomFilter<T>
    create(Funnel<? super T> funnel, long expectedInsertions, double fpp)
    Creates a BloomFilter with the expected number of insertions and expected false positive probability.
    (package private) static <T> BloomFilter<T>
    create(Funnel<? super T> funnel, long expectedInsertions, double fpp, BloomFilter.Strategy strategy)
     
    boolean
    equals(Object object)
    Indicates whether another object is equal to this predicate.
    double
    Returns the probability that mightContain(Object) will erroneously return true for an object that has not actually been put in the BloomFilter.
    int
     
    boolean
    Determines whether a given Bloom filter is compatible with this Bloom filter.
    boolean
    mightContain(T object)
    Returns true if the element might have been put in this Bloom filter, false if this is definitely not the case.
    (package private) static long
    optimalNumOfBits(long n, double p)
    Computes m (total bits of Bloom filter) which is expected to achieve, for the specified expected insertions, the required false positive probability.
    (package private) static int
    optimalNumOfHashFunctions(long n, long m)
    Computes the optimal k (number of hashes per element inserted in Bloom filter), given the expected insertions and total number of bits in the Bloom filter.
    boolean
    put(T object)
    Puts an element into this BloomFilter.
    void
    Combines this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying data.
    static <T> BloomFilter<T>
    readFrom(InputStream in, Funnel<? super T> funnel)
    Reads a byte stream, which was written by writeTo(OutputStream), into a BloomFilter.
    private void
     
    static <T> Collector<T,?,BloomFilter<T>>
    toBloomFilter(Funnel<? super T> funnel, long expectedInsertions)
    Returns a Collector expecting the specified number of insertions, and yielding a BloomFilter with false positive probability 3%.
    static <T> Collector<T,?,BloomFilter<T>>
    toBloomFilter(Funnel<? super T> funnel, long expectedInsertions, double fpp)
    Returns a Collector expecting the specified number of insertions, and yielding a BloomFilter with the specified expected false positive probability.
    private Object
     
    void
    Writes this BloomFilter to an output stream, with a custom format (not Java serialization).

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface com.google.common.base.Predicate

    test

    Methods inherited from interface java.util.function.Predicate

    and, negate, or
  • Field Details

    • bits

      The bit set of the BloomFilter (not necessarily power of 2!)
    • numHashFunctions

      private final int numHashFunctions
      Number of hashes per element
    • funnel

      private final Funnel<? super T> funnel
      The funnel to translate Ts to bytes
    • strategy

      private final BloomFilter.Strategy strategy
      The strategy we employ to map an element T to numHashFunctions bit indexes.
    • serialVersionUID

      private static final long serialVersionUID
      See Also:
  • Constructor Details

  • Method Details

    • copy

      public BloomFilter<T> copy()
      Creates a new BloomFilter that's a copy of this instance. The new instance is equal to this instance but shares no mutable state.
      Since:
      12.0
    • mightContain

      public boolean mightContain(T object)
      Returns true if the element might have been put in this Bloom filter, false if this is definitely not the case.
    • apply

      @Deprecated public boolean apply(T input)
      Deprecated.
      Provided only to satisfy the Predicate interface; use mightContain(T) instead.
      Description copied from interface: Predicate
      Returns the result of applying this predicate to input (Java 8+ users, see notes in the class documentation above). This method is generally expected, but not absolutely required, to have the following properties:
      • Its execution does not cause any observable side effects.
      • The computation is consistent with equals; that is, Objects.equal(a, b) implies that predicate.apply(a) == predicate.apply(b)).
      Specified by:
      apply in interface Predicate<T>
    • put

      public boolean put(T object)
      Puts an element into this BloomFilter. Ensures that subsequent invocations of mightContain(Object) with the same element will always return true.
      Returns:
      true if the Bloom filter's bits changed as a result of this operation. If the bits changed, this is definitely the first time object has been added to the filter. If the bits haven't changed, this might be the first time object has been added to the filter. Note that put(t) always returns the opposite result to what mightContain(t) would have returned at the time it is called.
      Since:
      12.0 (present in 11.0 with void return type})
    • expectedFpp

      public double expectedFpp()
      Returns the probability that mightContain(Object) will erroneously return true for an object that has not actually been put in the BloomFilter.

      Ideally, this number should be close to the fpp parameter passed in create(Funnel, int, double), or smaller. If it is significantly higher, it is usually the case that too many elements (more than expected) have been put in the BloomFilter, degenerating it.

      Since:
      14.0 (since 11.0 as expectedFalsePositiveProbability())
    • approximateElementCount

      public long approximateElementCount()
      Returns an estimate for the total number of distinct elements that have been added to this Bloom filter. This approximation is reasonably accurate if it does not exceed the value of expectedInsertions that was used when constructing the filter.
      Since:
      22.0
    • bitSize

      long bitSize()
      Returns the number of bits in the underlying bit array.
    • isCompatible

      public boolean isCompatible(BloomFilter<T> that)
      Determines whether a given Bloom filter is compatible with this Bloom filter. For two Bloom filters to be compatible, they must:
      • not be the same instance
      • have the same number of hash functions
      • have the same bit size
      • have the same strategy
      • have equal funnels
      Parameters:
      that - The Bloom filter to check for compatibility.
      Since:
      15.0
    • putAll

      public void putAll(BloomFilter<T> that)
      Combines this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying data. The mutations happen to this instance. Callers must ensure the Bloom filters are appropriately sized to avoid saturating them.
      Parameters:
      that - The Bloom filter to combine this Bloom filter with. It is not mutated.
      Throws:
      IllegalArgumentException - if isCompatible(that) == false
      Since:
      15.0
    • equals

      public boolean equals(@CheckForNull Object object)
      Description copied from interface: Predicate
      Indicates whether another object is equal to this predicate.

      Most implementations will have no reason to override the behavior of Object.equals(java.lang.Object). However, an implementation may also choose to return true whenever object is a Predicate that it considers interchangeable with this one. "Interchangeable" typically means that this.apply(t) == that.apply(t) for all t of type T). Note that a false result from this method does not imply that the predicates are known not to be interchangeable.

      Specified by:
      equals in interface Predicate<T>
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toBloomFilter

      public static <T> Collector<T,?,BloomFilter<T>> toBloomFilter(Funnel<? super T> funnel, long expectedInsertions)
      Returns a Collector expecting the specified number of insertions, and yielding a BloomFilter with false positive probability 3%.

      Note that if the Collector receives significantly more elements than specified, the resulting BloomFilter will suffer a sharp deterioration of its false positive probability.

      The constructed BloomFilter will be serializable if the provided Funnel<T> is.

      It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

      Parameters:
      funnel - the funnel of T's that the constructed BloomFilter will use
      expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
      Returns:
      a Collector generating a BloomFilter of the received elements
      Since:
      23.0
    • toBloomFilter

      public static <T> Collector<T,?,BloomFilter<T>> toBloomFilter(Funnel<? super T> funnel, long expectedInsertions, double fpp)
      Returns a Collector expecting the specified number of insertions, and yielding a BloomFilter with the specified expected false positive probability.

      Note that if the Collector receives significantly more elements than specified, the resulting BloomFilter will suffer a sharp deterioration of its false positive probability.

      The constructed BloomFilter will be serializable if the provided Funnel<T> is.

      It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

      Parameters:
      funnel - the funnel of T's that the constructed BloomFilter will use
      expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
      fpp - the desired false positive probability (must be positive and less than 1.0)
      Returns:
      a Collector generating a BloomFilter of the received elements
      Since:
      23.0
    • create

      public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions, double fpp)
      Creates a BloomFilter with the expected number of insertions and expected false positive probability.

      Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

      The constructed BloomFilter will be serializable if the provided Funnel<T> is.

      It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

      Parameters:
      funnel - the funnel of T's that the constructed BloomFilter will use
      expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
      fpp - the desired false positive probability (must be positive and less than 1.0)
      Returns:
      a BloomFilter
    • create

      public static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp)
      Creates a BloomFilter with the expected number of insertions and expected false positive probability.

      Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

      The constructed BloomFilter will be serializable if the provided Funnel<T> is.

      It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

      Parameters:
      funnel - the funnel of T's that the constructed BloomFilter will use
      expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
      fpp - the desired false positive probability (must be positive and less than 1.0)
      Returns:
      a BloomFilter
      Since:
      19.0
    • create

      static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp, BloomFilter.Strategy strategy)
    • create

      public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions)
      Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.

      Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

      The constructed BloomFilter will be serializable if the provided Funnel<T> is.

      It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

      Parameters:
      funnel - the funnel of T's that the constructed BloomFilter will use
      expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
      Returns:
      a BloomFilter
    • create

      public static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions)
      Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.

      Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.

      The constructed BloomFilter will be serializable if the provided Funnel<T> is.

      It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.

      Parameters:
      funnel - the funnel of T's that the constructed BloomFilter will use
      expectedInsertions - the number of expected insertions to the constructed BloomFilter; must be positive
      Returns:
      a BloomFilter
      Since:
      19.0
    • optimalNumOfHashFunctions

      static int optimalNumOfHashFunctions(long n, long m)
      Computes the optimal k (number of hashes per element inserted in Bloom filter), given the expected insertions and total number of bits in the Bloom filter.

      See http://en.wikipedia.org/wiki/File:Bloom_filter_fp_probability.svg for the formula.

      Parameters:
      n - expected insertions (must be positive)
      m - total number of bits in Bloom filter (must be positive)
    • optimalNumOfBits

      static long optimalNumOfBits(long n, double p)
      Computes m (total bits of Bloom filter) which is expected to achieve, for the specified expected insertions, the required false positive probability.

      See http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives for the formula.

      Parameters:
      n - expected insertions (must be positive)
      p - false positive rate (must be 0 invalid input: '<' p invalid input: '<' 1)
    • writeReplace

      private Object writeReplace()
    • readObject

      private void readObject(ObjectInputStream stream) throws InvalidObjectException
      Throws:
      InvalidObjectException
    • writeTo

      public void writeTo(OutputStream out) throws IOException
      Writes this BloomFilter to an output stream, with a custom format (not Java serialization). This has been measured to save at least 400 bytes compared to regular serialization.

      Use readFrom(InputStream, Funnel) to reconstruct the written BloomFilter.

      Throws:
      IOException
    • readFrom

      public static <T> BloomFilter<T> readFrom(InputStream in, Funnel<? super T> funnel) throws IOException
      Reads a byte stream, which was written by writeTo(OutputStream), into a BloomFilter.

      The Funnel to be used is not encoded in the stream, so it must be provided here. Warning: the funnel provided must behave identically to the one used to populate the original Bloom filter!

      Throws:
      IOException - if the InputStream throws an IOException, or if its data does not appear to be a BloomFilter serialized using the writeTo(OutputStream) method.