Interface OfflineDuplicateDetection<C extends java.lang.Comparable<C>,T>
-
- All Superinterfaces:
DuplicateDetection<C,T>
- Functional Interface:
- This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.
@FunctionalInterface public interface OfflineDuplicateDetection<C extends java.lang.Comparable<C>,T> extends DuplicateDetection<C,T>
The offline duplicate detection returns all duplicate records within a dataset.Consider a dataset of records A, B, A' and A, A' being duplicates. The final result will be [(A, A'), (B)].
The actual implementation may use any means necessary to find duplicates and to ensure proper transitivity ((A,B) is duplicate and (B,C) is duplicate implies that (A,C) is duplicate).
- Implementation Requirements:
- For offline algorithms, the general assumption is that they work stateless. Derivations need to be documented.
-
-
Method Summary
-
Methods inherited from interface com.bakdata.dedupe.duplicate_detection.DuplicateDetection
detectDuplicates, materializeDuplicates
-
-