Package com.bakdata.dedupe.duplicate_detection.online
Online duplicate detection to find duplicate clusters in streaming data.
-
Interface Summary Interface Description OnlineDuplicateDetection<C extends java.lang.Comparable<C>,T> An online duplicate detection algorithm processes a stream of records and returns all changedCluster
s for each record. -
Class Summary Class Description OnlinePairBasedDuplicateDetection<C extends java.lang.Comparable<C>,T> A pair-based duplicate detection algorithm, which PerformsOnlineCandidateSelection
Applies aClassifier
to the foundCandidate
s Transforms the found duplicate pairs with aClustering
intoClusters
sOnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder<C extends java.lang.Comparable<C>,T>