Package | Description |
---|---|
com.bakdata.dedupe.candidate_selection |
Base data structured shared by online and offline candidate selections that choose promising pairs to limit search
space for duplicates.
|
com.bakdata.dedupe.candidate_selection.offline |
Interfaces and implementations for offline candidate selections that choose promising pairs to limit search space for
duplicates in a materialized dataset.
|
com.bakdata.dedupe.candidate_selection.online |
Interfaces and implementations for online candidate selections that choose promising pairs to limit search space for
duplicates in a streaming dataset.
|
com.bakdata.dedupe.classifier |
Interfaces, data structures, and implementations for the classification of
Candidate pairs into duplicates and non-duplicates. |
com.bakdata.dedupe.clustering |
Clusters
ClassifiedCandidate s into coherent clusters. |
com.bakdata.dedupe.deduplication |
Provides interfaces and implementations for a full deduplication process, which ensures that no duplicate record is
emitted.
|
com.bakdata.dedupe.deduplication.offline |
Full offline deduplication for materialized data.
|
com.bakdata.dedupe.deduplication.online |
Full online deduplication for streaming data.
|
com.bakdata.dedupe.duplicate_detection |
Provides base interfaces and implementations for finding duplicate clusters.
|
com.bakdata.dedupe.duplicate_detection.offline |
Offline duplicate detection to find duplicate clusters in materialized data.
|
com.bakdata.dedupe.duplicate_detection.online |
Online duplicate detection to find duplicate clusters in streaming data.
|
com.bakdata.dedupe.fusion |
Provides means to reconcile a duplicate cluster into a consistent representation.
|
com.bakdata.dedupe.matching |
Assigns and matches nodes of a bipartite graph.
|
com.bakdata.dedupe.similarity |
Data structures, interfaces, and implementations to define similarity measures that are ultimately used to detect
duplicates.
|
com.bakdata.util |
Utility classes that should not be deemed public API.
|