Package com.bakdata.dedupe.clustering
Class ConsistentClustering<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>>
- java.lang.Object
-
- com.bakdata.dedupe.clustering.ConsistentClustering<C,T,I>
-
- All Implemented Interfaces:
Clustering<C,T>
public final class ConsistentClustering<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> extends java.lang.Object implements Clustering<C,T>
Wraps another clustering and keeps clusters together, when the wrapped clustering would split it.
Example: consider a stable marriage-based clustering where A1-B have been previously matched and subsequently clustered. If a strong A2-B would replace that pair and thus split the cluster, this consistent clustering returns a cluster [A1, A2, B] instead.
This clustering is similar to
It thus trades off clustering accuracy to increase reliability of subsequent data processing.TransitiveClosure
but allows the wrapped clustering to split temporary (=not-returned) clusters. Thus, in the example above, we have the following two situations: - If A1-B and A2-B would be passed in the same invocation ofcluster(Stream)
, only cluster [A2, B] would be returned. - If A-B is passed in a first invocation, this invocation returns [A1, B]. The following invocation with A2-B would then return [A1, A2, B].- Implementation Note:
- This implementation materializes all clusters returned by
Clustering
.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static <C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>>
ConsistentClustering.ConsistentClusteringBuilder<C,T,I>builder()
@NonNull java.util.stream.Stream<Cluster<C,T>>
cluster(@NonNull java.util.stream.Stream<ClassifiedCandidate<T>> classifiedCandidates)
Creates a coherentCluster
from a list ofClassifiedCandidate
s.boolean
equals(java.lang.Object o)
@NonNull java.util.function.Function<? super java.lang.Iterable<? extends T>,C>
getClusterIdGenerator()
The cluster id generator that is used to create an id for a new cluster.@NonNull Clustering<C,T>
getClustering()
The wrapped clustering.@NonNull java.util.function.Function<T,I>
getIdExtractor()
A function to extract the id of a record for efficient, internal data structures.int
hashCode()
java.lang.String
toString()
-
-
-
Method Detail
-
cluster
@NonNull public @NonNull java.util.stream.Stream<Cluster<C,T>> cluster(@NonNull @NonNull java.util.stream.Stream<ClassifiedCandidate<T>> classifiedCandidates)
Description copied from interface:Clustering
Creates a coherentCluster
from a list ofClassifiedCandidate
s.- Specified by:
cluster
in interfaceClustering<C extends java.lang.Comparable<C>,T>
- Parameters:
classifiedCandidates
- the list of classified candidates.- Returns:
- a coherent cluster over the classified candidates.
-
getClusterIdGenerator
@NonNull public @NonNull java.util.function.Function<? super java.lang.Iterable<? extends T>,C> getClusterIdGenerator()
Description copied from interface:Clustering
The cluster id generator that is used to create an id for a new cluster.- Specified by:
getClusterIdGenerator
in interfaceClustering<C extends java.lang.Comparable<C>,T>
- Returns:
- the cluster id generator.
-
builder
public static <C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> ConsistentClustering.ConsistentClusteringBuilder<C,T,I> builder()
-
getClustering
@NonNull public @NonNull Clustering<C,T> getClustering()
The wrapped clustering.
-
getIdExtractor
@NonNull public @NonNull java.util.function.Function<T,I> getIdExtractor()
A function to extract the id of a record for efficient, internal data structures.
-
equals
public boolean equals(java.lang.Object o)
- Overrides:
equals
in classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-