Class ConsistentClustering<C extends java.lang.Comparable<C>,​T,​I extends java.lang.Comparable<? super I>>

  • All Implemented Interfaces:
    Clustering<C,​T>

    public final class ConsistentClustering<C extends java.lang.Comparable<C>,​T,​I extends java.lang.Comparable<? super I>>
    extends java.lang.Object
    implements Clustering<C,​T>
    Wraps another clustering and keeps clusters together, when the wrapped clustering would split it.
    Example: consider a stable marriage-based clustering where A1-B have been previously matched and subsequently clustered. If a strong A2-B would replace that pair and thus split the cluster, this consistent clustering returns a cluster [A1, A2, B] instead.

    This clustering is similar to TransitiveClosure but allows the wrapped clustering to split temporary (=not-returned) clusters. Thus, in the example above, we have the following two situations: - If A1-B and A2-B would be passed in the same invocation of cluster(Stream), only cluster [A2, B] would be returned. - If A-B is passed in a first invocation, this invocation returns [A1, B]. The following invocation with A2-B would then return [A1, A2, B].

    It thus trades off clustering accuracy to increase reliability of subsequent data processing.
    Implementation Note:
    This implementation materializes all clusters returned by Clustering.
    • Method Detail

      • cluster

        @NonNull
        public @NonNull java.util.stream.Stream<Cluster<C,​T>> cluster​(@NonNull
                                                                            @NonNull java.util.stream.Stream<ClassifiedCandidate<T>> classifiedCandidates)
        Description copied from interface: Clustering
        Creates a coherent Cluster from a list of ClassifiedCandidates.
        Specified by:
        cluster in interface Clustering<C extends java.lang.Comparable<C>,​T>
        Parameters:
        classifiedCandidates - the list of classified candidates.
        Returns:
        a coherent cluster over the classified candidates.
      • getClusterIdGenerator

        @NonNull
        public @NonNull java.util.function.Function<? super java.lang.Iterable<? extends T>,​C> getClusterIdGenerator()
        Description copied from interface: Clustering
        The cluster id generator that is used to create an id for a new cluster.
        Specified by:
        getClusterIdGenerator in interface Clustering<C extends java.lang.Comparable<C>,​T>
        Returns:
        the cluster id generator.
      • getClustering

        @NonNull
        public @NonNull Clustering<C,​T> getClustering()
        The wrapped clustering.
      • getIdExtractor

        @NonNull
        public @NonNull java.util.function.Function<T,​I> getIdExtractor()
        A function to extract the id of a record for efficient, internal data structures.
      • equals

        public boolean equals​(java.lang.Object o)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object