A B C D E F G H I J K L M N O P R S T U V W
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- AbstractMatcher(List<? extends Queue<List<Integer>>>, List<? extends Queue<List<Integer>>>) - Constructor for class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- AbstractStableMarriage<T> - Class in com.bakdata.dedupe.matching
- AbstractStableMarriage() - Constructor for class com.bakdata.dedupe.matching.AbstractStableMarriage
- AbstractStableMarriage.AbstractMatcher - Class in com.bakdata.dedupe.matching
- AbstractStableMarriage.Matcher - Interface in com.bakdata.dedupe.matching
- add(double, SimilarityMeasure<R>) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- add(double, Function<R, ? extends T>, SimilarityMeasure<? super T>) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- add(T) - Method in class com.bakdata.dedupe.clustering.Cluster
- AdditionalFieldMergeBuilder(Merge.IllTypedFieldMergeBuilder<F, F, R>) - Constructor for class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- AggregatingSimilarityMeasure<T> - Class in com.bakdata.dedupe.similarity
-
Aggregates similarities with a given aggregator.
- AggregatingSimilarityMeasure(ToDoubleFunction<? super DoubleStream>, SimilarityMeasure<? super T>...) - Constructor for class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
-
Creates an AggregatingSimilarityMeasure with the given aggregator and the similarity measures.
- AggregatingSimilarityMeasure(ToDoubleFunction<? super DoubleStream>, Iterable<? extends SimilarityMeasure<? super T>>) - Constructor for class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
-
Creates an AggregatingSimilarityMeasure with the given aggregator and the similarity measures.
- AggregatingSimilarityMeasure.AggregatingSimilarityMeasureBuilder<T> - Class in com.bakdata.dedupe.similarity
- aggregator(ToDoubleFunction<? super DoubleStream>) - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure.AggregatingSimilarityMeasureBuilder
- aggregator(ToDoubleFunction<Stream<WeightedAggregatingSimilarityMeasure.WeightedValue>>) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- and(T) - Method in class com.bakdata.dedupe.candidate_selection.CompositeValue
-
Adds another value to the composition.
- andThen(ConflictResolution<O, O2>) - Method in interface com.bakdata.dedupe.fusion.ConflictResolution
-
Chains two conflict resolution functions, so that if some values remain unresolved after this conflict resolution, the successor will be applied on these remaining alternatives.
- andThen(ValueTransformation<? super R, ? extends V>) - Method in interface com.bakdata.dedupe.similarity.ValueTransformation
-
Composes this and the other transformation, such that this transformation is first applied to values and the given transformation is applied afterwards.
- AnnotatedValue<T> - Class in com.bakdata.dedupe.fusion
-
A value with lineage information; in particular, the
Source
and the timestamp. - AnnotatedValue(T, Source, LocalDateTime) - Constructor for class com.bakdata.dedupe.fusion.AnnotatedValue
- apply(FusedValue<T>) - Method in interface com.bakdata.dedupe.fusion.IncompleteFusionHandler
- assign(Collection<WeightedEdge<T>>) - Method in interface com.bakdata.dedupe.matching.Assigner
-
Finds the set of edges forming a matching that maximizes the sum of the edge weights.
- assign(Collection<WeightedEdge<T>>) - Method in interface com.bakdata.dedupe.matching.BipartiteMatcher
- Assigner<T> - Interface in com.bakdata.dedupe.matching
-
Implements an algorithm that solves an assignment problem.
- assignMaterialized(Collection<WeightedEdge<T>>) - Method in interface com.bakdata.dedupe.matching.Assigner
-
Finds the set of edges forming a matching that maximizes the sum of the edge weights.
- assumeEqualValue() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
A no-op conflict resolution that will eventually lead to a
FusionException
, when values are not resolved.
B
- beiderMorse() - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Creates a normalizer that turns strings into the beider morse code.
- bigram() - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Creates a tokenizer for a string into bigrams; that is, two succeeding characters.
- binarize() - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Binarizes the similarity returned by this similarity, such that all values
>0
result in a similarity of 1. - BipartiteMatcher<T> - Interface in com.bakdata.dedupe.matching
-
Implements an algorithm that finds a matching in a bipartite graph.
- breakEngangement(Integer, Integer) - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- build() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
- build() - Method in class com.bakdata.dedupe.classifier.ClassificationResult.ClassificationResultBuilder
- build() - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate.ClassifiedCandidateBuilder
- build() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
- build() - Method in class com.bakdata.dedupe.clustering.Cluster.ClusterBuilder
- build() - Method in class com.bakdata.dedupe.clustering.ConsistentClustering.ConsistentClusteringBuilder
- build() - Method in class com.bakdata.dedupe.clustering.RefineCluster.RefineClusterBuilder
- build() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure.RefinedTransitiveClosureBuilder
- build() - Method in class com.bakdata.dedupe.clustering.TransitiveClosure.TransitiveClosureBuilder
- build() - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection.FusingOnlineDuplicateDetectionBuilder
- build() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder
- build() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- build() - Method in class com.bakdata.dedupe.fusion.FusionContext.FusionContextBuilder
- build() - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- build() - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- build() - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure.AggregatingSimilarityMeasureBuilder
- build() - Method in class com.bakdata.dedupe.similarity.SimilarityContext.SimilarityContextBuilder
- build() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- builder() - Static method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod
- builder() - Static method in class com.bakdata.dedupe.classifier.ClassificationResult
- builder() - Static method in class com.bakdata.dedupe.classifier.ClassifiedCandidate
- builder() - Static method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
- builder() - Static method in class com.bakdata.dedupe.clustering.Cluster
- builder() - Static method in class com.bakdata.dedupe.clustering.ConsistentClustering
- builder() - Static method in class com.bakdata.dedupe.clustering.RefineCluster
- builder() - Static method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
- builder() - Static method in class com.bakdata.dedupe.clustering.TransitiveClosure
- builder() - Static method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
- builder() - Static method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
- builder() - Static method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
- builder() - Static method in class com.bakdata.dedupe.fusion.FusionContext
- builder() - Static method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
- builder() - Static method in class com.bakdata.dedupe.similarity.SimilarityContext
- builder() - Static method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
C
- CachingSimilarity<T,I> - Class in com.bakdata.dedupe.similarity
-
A light-weight caching layer over
SimilarityMeasure
to allow implementations to repeatedly calculate expensive similarities without implementing a cache on their own. - CachingSimilarity(SimilarityMeasure<T>, Function<T, I>) - Constructor for class com.bakdata.dedupe.similarity.CachingSimilarity
- calculated(T) - Static method in class com.bakdata.dedupe.fusion.AnnotatedValue
-
Wraps the given value with a calculated source and the current timestamp.
- calculateNonEmptyCollectionSimilarity(C, C, SimilarityContext) - Method in interface com.bakdata.dedupe.similarity.CollectionSimilarityMeasure
-
Calculates the similarity ignoring the trivial cases of empty input collections.
- calculateNonEmptyCollectionSimilarity(C, C, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.CosineSimilarityMeasure
- calculateNonEmptyCollectionSimilarity(C, C, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.MatchingSimilarity
- calculateNonEmptyCollectionSimilarity(C, C, SimilarityContext) - Method in interface com.bakdata.dedupe.similarity.SetSimilarityMeasure
- calculateNonEmptySetSimilarity(Set<E>, Set<E>, SimilarityContext) - Method in interface com.bakdata.dedupe.similarity.SetSimilarityMeasure
-
Calculates the similarity ignoring the trivial cases of empty input collections.
- candidate(Candidate<T>) - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate.ClassifiedCandidateBuilder
- Candidate<T> - Interface in com.bakdata.dedupe.candidate_selection
-
Represents a candidate pair that was generated with an
OfflineCandidateSelection
. - candidateSelection(OnlineCandidateSelection<T>) - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder
- CandidateSelection<T> - Interface in com.bakdata.dedupe.candidate_selection
-
Selects candidates from a static or dynamic dataset accessible through Iterables.
- canEqual(Object) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.Pass
- canEqual(Object) - Method in class com.bakdata.dedupe.clustering.Cluster
- canEqual(Object) - Method in class com.bakdata.dedupe.matching.StronglyStableMarriage
- canEqual(Object) - Method in class com.bakdata.dedupe.matching.WeaklyStableMarriage
- checkSplit(Collection<? extends Cluster<C, T>>, T) - Method in interface com.bakdata.dedupe.clustering.ClusterSplitHandler
-
Checks if the given set of clusters contains more than element and invokes
ClusterSplitHandler.clusterSplit(Cluster, List)
. - classification(Classification) - Method in class com.bakdata.dedupe.classifier.ClassificationResult.ClassificationResultBuilder
- Classification - Enum in com.bakdata.dedupe.classifier
-
Contains the possible classification classes.
- ClassificationException - Exception in com.bakdata.dedupe.classifier
-
An exception thrown by a
Classifier
whenever an exception during classification occurred. - ClassificationException() - Constructor for exception com.bakdata.dedupe.classifier.ClassificationException
- ClassificationException(String) - Constructor for exception com.bakdata.dedupe.classifier.ClassificationException
- ClassificationException(String, Throwable) - Constructor for exception com.bakdata.dedupe.classifier.ClassificationException
- ClassificationException(String, Throwable, boolean, boolean) - Constructor for exception com.bakdata.dedupe.classifier.ClassificationException
- ClassificationException(Throwable) - Constructor for exception com.bakdata.dedupe.classifier.ClassificationException
- classificationResult(ClassificationResult) - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate.ClassifiedCandidateBuilder
- ClassificationResult - Class in com.bakdata.dedupe.classifier
-
The classification of a
Candidate
with additional information. - ClassificationResult.ClassificationResultBuilder - Class in com.bakdata.dedupe.classifier
- ClassifiedCandidate<T> - Class in com.bakdata.dedupe.classifier
-
Contains a
ClassificationResult
with the correspondingCandidate
. - ClassifiedCandidate(Candidate<T>, ClassificationResult) - Constructor for class com.bakdata.dedupe.classifier.ClassifiedCandidate
- ClassifiedCandidate.ClassifiedCandidateBuilder<T> - Class in com.bakdata.dedupe.classifier
- classifier(Classifier<T>) - Method in class com.bakdata.dedupe.clustering.RefineCluster.RefineClusterBuilder
- classifier(Classifier<T>) - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder
- Classifier<T> - Interface in com.bakdata.dedupe.classifier
-
Classifies a
OnlineCandidate
as duplicate, non-duplicate, or otherClassification
s. - classify(Candidate<T>) - Method in interface com.bakdata.dedupe.classifier.Classifier
-
Classifies the
OnlineCandidate
as duplicate, non-duplicate, or otherClassification
s. - classify(Candidate<T>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
- classify(Candidate<T>) - Method in class com.bakdata.dedupe.classifier.OracleClassifier
- classifyCandidate(Candidate<T>) - Method in interface com.bakdata.dedupe.classifier.Classifier
-
Classifies the
Candidate
as duplicate, non-duplicate, or otherClassification
s and stores theClassificationResult
together with the candidate. - clearPasses() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
- clearRules() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
- clearSources() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- clearWeightedSimilarities() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- closure(TransitiveClosure<C, T, I>) - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure.RefinedTransitiveClosureBuilder
- cluster(Stream<ClassifiedCandidate<T>>) - Method in interface com.bakdata.dedupe.clustering.Clustering
-
Creates a coherent
Cluster
from a list ofClassifiedCandidate
s. - cluster(Stream<ClassifiedCandidate<T>>) - Method in class com.bakdata.dedupe.clustering.ConsistentClustering
- cluster(Stream<ClassifiedCandidate<T>>) - Method in class com.bakdata.dedupe.clustering.OracleClustering
- cluster(Stream<ClassifiedCandidate<T>>) - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
- cluster(Stream<ClassifiedCandidate<T>>) - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
- Cluster<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.clustering
-
A cluster is a coherent collection of duplicate records.
- Cluster(C) - Constructor for class com.bakdata.dedupe.clustering.Cluster
- Cluster(C, List<T>) - Constructor for class com.bakdata.dedupe.clustering.Cluster
- Cluster.ClusterBuilder<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.clustering
- clusterDuplicates(Iterable<? extends Candidate<T>>) - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
- clusterIdGenerator(Function<? super Iterable<? extends T>, C>) - Method in class com.bakdata.dedupe.clustering.RefineCluster.RefineClusterBuilder
- clusterIdGenerator(Function<? super Iterable<? extends T>, C>) - Method in class com.bakdata.dedupe.clustering.TransitiveClosure.TransitiveClosureBuilder
- ClusterIdGenerators - Class in com.bakdata.dedupe.clustering
-
A collection of typical cluster id generators.
- clusterIndex(Map<I, Cluster<C, T>>) - Method in class com.bakdata.dedupe.clustering.TransitiveClosure.TransitiveClosureBuilder
- clustering(Clustering<C, T>) - Method in class com.bakdata.dedupe.clustering.ConsistentClustering.ConsistentClusteringBuilder
- clustering(Clustering<C, T>) - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder
- Clustering<C extends java.lang.Comparable<C>,T> - Interface in com.bakdata.dedupe.clustering
-
A clustering algorithm takes a list of
ClassifiedCandidate
s and creates a coherentCluster
, such that all pairs of records inside the cluster are duplicate and no record outside the cluster is a duplicate with any record inside the cluster. - Clusters - Class in com.bakdata.dedupe.clustering
-
Utility methods for
Cluster
- clusterSplit(Cluster<C, T>, List<Cluster<C, T>>) - Method in interface com.bakdata.dedupe.clustering.ClusterSplitHandler
-
Invoked when an already existing cluster is split up during the
Clustering
of (online) deduplication. - ClusterSplitHandler<C extends java.lang.Comparable<C>,T> - Interface in com.bakdata.dedupe.clustering
-
A callback that is invoked when an already existing cluster is split up during the
Clustering
of (online) deduplication. - codec(StringEncoder) - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Wraps a
StringEncoder
into aValueTransformation
. - CollectionSimilarityMeasure<C extends java.util.Collection<? extends E>,E> - Interface in com.bakdata.dedupe.similarity
-
A
SimilarityMeasure
that is defined overCollection
s. - colognePhonetic() - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Creates a normalizer that turns strings into the cologne phonetics.
- com.bakdata.dedupe.candidate_selection - package com.bakdata.dedupe.candidate_selection
-
Base data structured shared by online and offline candidate selections that choose promising pairs to limit search space for duplicates.
- com.bakdata.dedupe.candidate_selection.offline - package com.bakdata.dedupe.candidate_selection.offline
-
Interfaces and implementations for offline candidate selections that choose promising pairs to limit search space for duplicates in a materialized dataset.
- com.bakdata.dedupe.candidate_selection.online - package com.bakdata.dedupe.candidate_selection.online
-
Interfaces and implementations for online candidate selections that choose promising pairs to limit search space for duplicates in a streaming dataset.
- com.bakdata.dedupe.classifier - package com.bakdata.dedupe.classifier
-
Interfaces, data structures, and implementations for the classification of
Candidate
pairs into duplicates and non-duplicates. - com.bakdata.dedupe.clustering - package com.bakdata.dedupe.clustering
-
Clusters
ClassifiedCandidate
s into coherent clusters. - com.bakdata.dedupe.deduplication - package com.bakdata.dedupe.deduplication
-
Provides interfaces and implementations for a full deduplication process, which ensures that no duplicate record is emitted.
- com.bakdata.dedupe.deduplication.offline - package com.bakdata.dedupe.deduplication.offline
-
Full offline deduplication for materialized data.
- com.bakdata.dedupe.deduplication.online - package com.bakdata.dedupe.deduplication.online
-
Full online deduplication for streaming data.
- com.bakdata.dedupe.duplicate_detection - package com.bakdata.dedupe.duplicate_detection
-
Provides base interfaces and implementations for finding duplicate clusters.
- com.bakdata.dedupe.duplicate_detection.offline - package com.bakdata.dedupe.duplicate_detection.offline
-
Offline duplicate detection to find duplicate clusters in materialized data.
- com.bakdata.dedupe.duplicate_detection.online - package com.bakdata.dedupe.duplicate_detection.online
-
Online duplicate detection to find duplicate clusters in streaming data.
- com.bakdata.dedupe.fusion - package com.bakdata.dedupe.fusion
-
Provides means to reconcile a duplicate cluster into a consistent representation.
- com.bakdata.dedupe.matching - package com.bakdata.dedupe.matching
-
Assigns and matches nodes of a bipartite graph.
- com.bakdata.dedupe.similarity - package com.bakdata.dedupe.similarity
-
Data structures, interfaces, and implementations to define similarity measures that are ultimately used to detect duplicates.
- com.bakdata.util - package com.bakdata.util
-
Utility classes that should not be deemed public API.
- CommonConflictResolutions - Class in com.bakdata.dedupe.fusion
-
Provides factory methods for common conflict resolutions.
- CommonSimilarityMeasures - Class in com.bakdata.dedupe.similarity
-
A utility class that offers factory methods for common similarity measures.
- CommonTransformations - Class in com.bakdata.dedupe.similarity
-
A utility class that offers factory methods for common similarity transformations.
- compareTo(CompositeValue<T1, T2>) - Method in class com.bakdata.dedupe.candidate_selection.CompositeValue
- compose(T1, T2) - Static method in class com.bakdata.dedupe.candidate_selection.CompositeValue
-
Creates a composite value of the two values.
- CompositeValue<T1 extends java.lang.Comparable<T1>,T2 extends java.lang.Comparable<T2>> - Class in com.bakdata.dedupe.candidate_selection
-
Helper for composed (sorting) keys, which will perform position-wise comparison of the key elements.
- CompositeValue(T1, T2) - Constructor for class com.bakdata.dedupe.candidate_selection.CompositeValue
- confidence(double) - Method in class com.bakdata.dedupe.classifier.ClassificationResult.ClassificationResultBuilder
- ConflictResolution<I,O> - Interface in com.bakdata.dedupe.fusion
-
Solves a conflict during resolution - two or more different values that need to be merged into a single value for the fused representation.
- ConflictResolutionFusion<R> - Class in com.bakdata.dedupe.fusion
-
A fusion approach based on conflict resolution.
- ConflictResolutionFusion.ConflictResolutionFusionBuilder<R> - Class in com.bakdata.dedupe.fusion
- ConsistentClustering<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> - Class in com.bakdata.dedupe.clustering
-
Wraps another clustering and keeps clusters together, when the wrapped clustering would split it.
Example: consider a stable marriage-based clustering where A1-B have been previously matched and subsequently clustered. - ConsistentClustering.ConsistentClusteringBuilder<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> - Class in com.bakdata.dedupe.clustering
- contains(T) - Method in class com.bakdata.dedupe.clustering.Cluster
- contextSupplier(Supplier<SimilarityContext>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
- convertingBack(ConflictResolution<I, F>) - Method in class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- convertingWith(ConflictResolution<F, I>) - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- convertingWith(ConflictResolution<F, I>) - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- corresponding(ResolutionTag<?>) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Retains all values coming from the same source as the values resolved by the
ResolutionTag
. - corresponding(ResolutionTag<?>) - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- correspondingToPrevious() - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- cosine() - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Calculates the cosine similarity measure over two bags of elements.
- CosineSimilarityMeasure<C extends java.util.Collection<? extends E>,E> - Class in com.bakdata.dedupe.similarity
-
Calculates the cosine similarity measure over two bags of elements.
- CosineSimilarityMeasure() - Constructor for class com.bakdata.dedupe.similarity.CosineSimilarityMeasure
- createMatcher(List<? extends Queue<List<Integer>>>, List<? extends Queue<List<Integer>>>) - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage
- createMatcher(List<? extends Queue<List<Integer>>>, List<? extends Queue<List<Integer>>>) - Method in class com.bakdata.dedupe.matching.StronglyStableMarriage
- createMatcher(List<? extends Queue<List<Integer>>>, List<? extends Queue<List<Integer>>>) - Method in class com.bakdata.dedupe.matching.WeaklyStableMarriage
- cutoff(double) - Method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
- cutoff(double) - Method in class com.bakdata.dedupe.similarity.Levenshtein
- cutoff(double) - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Cuts the similarity returned by this similarity, such that all values
<threshold
result in a similarity of 0, and all values[threshold, 1]
are left untouched. - cutoff(double) - Method in class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
- cutoff(double, double) - Static method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
-
Cuts the similarity, such that all values
<threshold
result in a similarity of 0, and all values[threshold, 1]
are left untouched. - CutoffSimiliarityMeasure<T> - Class in com.bakdata.dedupe.similarity
-
Cuts the similarity returned by this similarity, such that all values
<threshold
result in a similarity of 0, and all values[threshold, 1]
are left untouched. - CutoffSimiliarityMeasure(SimilarityMeasure<T>, double) - Constructor for class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
D
- deduplicate(Stream<? extends T>) - Method in interface com.bakdata.dedupe.deduplication.Deduplication
-
Deduplicates the dataset.
- deduplicate(Stream<? extends T>) - Method in interface com.bakdata.dedupe.deduplication.online.OnlineDeduplication
- deduplicate(T) - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
- deduplicate(T) - Method in interface com.bakdata.dedupe.deduplication.online.OnlineDeduplication
-
Deduplicates the record with all previously seen records.
- Deduplication<T> - Interface in com.bakdata.dedupe.deduplication
-
A full deduplication process, which ensures that no duplicate record is emitted.
- defaultClassificationResult(ClassificationResult) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
- defaultRule(SimilarityMeasure<? super T>, double) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Creates a threshold rule named "default" that is always applied.
- defaultWindowSize(int) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
- delete(Integer, Integer) - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- demoteToNonDuplicate() - Static method in interface com.bakdata.dedupe.duplicate_detection.PossibleDuplicateHandler
-
Treats possible duplicates as non-duplicates.
- detectDuplicates(Stream<? extends T>) - Method in interface com.bakdata.dedupe.duplicate_detection.DuplicateDetection
-
Finds all duplicates in the dataset.
- detectDuplicates(Stream<? extends T>) - Method in interface com.bakdata.dedupe.duplicate_detection.online.OnlineDuplicateDetection
- detectDuplicates(T) - Method in interface com.bakdata.dedupe.duplicate_detection.online.OnlineDuplicateDetection
-
Returns all clusters that have been affected by the new record.
- detectDuplicates(T) - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
- distinct() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Returns the list of values that are unique.
- doesNotApply() - Static method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.Rule
-
Indicates that this similarity measure can not be applied (e.g., precondition not satisfied).
- dontFuse() - Static method in interface com.bakdata.dedupe.fusion.IncompleteFusionHandler
- DUPLICATE - com.bakdata.dedupe.classifier.Classification
-
A sure duplicate.
- duplicateDetection(OnlineDuplicateDetection<C, T>) - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection.FusingOnlineDuplicateDetectionBuilder
- DuplicateDetection<C extends java.lang.Comparable<C>,T> - Interface in com.bakdata.dedupe.duplicate_detection
-
A duplicate detection algorithm processes a dataset of records and returns the distinct
Cluster
s.
E
- earliest() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects all values with the same earliest
AnnotatedValue.getDateTime()
. - elements(List<T>) - Method in class com.bakdata.dedupe.clustering.Cluster.ClusterBuilder
- engagements - Variable in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- equality() - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
A similarity measure that is 1 if
left.equals(right)
or0
otherwise. - equality(SimilarityMeasure<T>) - Static method in class com.bakdata.dedupe.similarity.CachingSimilarity
-
Creates a cache based on the object equality; that is, the cache triggers if two equal instances are compared.
- equals(Object) - Method in class com.bakdata.dedupe.candidate_selection.CompositeValue
- equals(Object) - Method in class com.bakdata.dedupe.candidate_selection.offline.OfflineCandidate
- equals(Object) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
- equals(Object) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod
- equals(Object) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.Pass
- equals(Object) - Method in class com.bakdata.dedupe.candidate_selection.SortingKey
- equals(Object) - Method in class com.bakdata.dedupe.classifier.ClassificationResult
- equals(Object) - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate
- equals(Object) - Method in class com.bakdata.dedupe.classifier.OracleClassifier
- equals(Object) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
- equals(Object) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.Rule
- equals(Object) - Method in class com.bakdata.dedupe.clustering.Cluster
- equals(Object) - Method in class com.bakdata.dedupe.clustering.ConsistentClustering
- equals(Object) - Method in class com.bakdata.dedupe.clustering.OracleClustering
- equals(Object) - Method in class com.bakdata.dedupe.clustering.RefineCluster
- equals(Object) - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
- equals(Object) - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
- equals(Object) - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
- equals(Object) - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
- equals(Object) - Method in class com.bakdata.dedupe.fusion.AnnotatedValue
- equals(Object) - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
- equals(Object) - Method in class com.bakdata.dedupe.fusion.FusedValue
- equals(Object) - Method in class com.bakdata.dedupe.fusion.FusionContext
- equals(Object) - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- equals(Object) - Method in class com.bakdata.dedupe.fusion.Merge
- equals(Object) - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- equals(Object) - Method in class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- equals(Object) - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- equals(Object) - Method in class com.bakdata.dedupe.fusion.ResolutionTag
- equals(Object) - Method in class com.bakdata.dedupe.fusion.Source
- equals(Object) - Method in class com.bakdata.dedupe.matching.StronglyStableMarriage
- equals(Object) - Method in class com.bakdata.dedupe.matching.WeaklyStableMarriage
- equals(Object) - Method in class com.bakdata.dedupe.matching.WeightedEdge
- equals(Object) - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
- equals(Object) - Method in class com.bakdata.dedupe.similarity.CachingSimilarity
- equals(Object) - Method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
- equals(Object) - Method in class com.bakdata.dedupe.similarity.Levenshtein
- equals(Object) - Method in class com.bakdata.dedupe.similarity.MatchingSimilarity
- equals(Object) - Method in class com.bakdata.dedupe.similarity.SimilarityContext
- equals(Object) - Method in class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
- equals(Object) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
- equals(Object) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedSimilarity
- equals(Object) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedValue
- equals(Object) - Method in class com.bakdata.util.ExceptionContext
- equals(Object) - Method in class com.bakdata.util.FunctionalClass
- equals(Object) - Method in class com.bakdata.util.FunctionalConstructor
- equals(Object) - Method in class com.bakdata.util.FunctionalMethod
- equals(Object) - Method in class com.bakdata.util.FunctionalProperty
- ExceptionContext - Class in com.bakdata.util
-
The exception context allows the safe execution of code that may throw an exception.
- ExceptionContext() - Constructor for class com.bakdata.util.ExceptionContext
- explanation(String) - Method in class com.bakdata.dedupe.classifier.ClassificationResult.ClassificationResultBuilder
F
- field(FunctionalProperty<R, F>) - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- field(FunctionalProperty<R, F2>) - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- field(String) - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- field(String) - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- field(String) - Method in class com.bakdata.util.FunctionalClass
-
Returns the functional field with the given name.
- field(Function<R, F>, BiConsumer<R, F>) - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- field(Function<R, F2>, BiConsumer<R, F2>) - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- FieldMergeBuilder(Merge.MergeBuilder<R>, Function<R, F>, BiConsumer<R, F>) - Constructor for class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- first() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Picks a the first value out of the bag of values.
- first(SimilarityMeasure<? super T>...) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the first similarity of the given similarity measures that is not
SimilarityMeasure.unknown()
. - first(Iterable<? extends SimilarityMeasure<? super T>>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the first similarity of the given similarity measures that is not
SimilarityMeasure.unknown()
. - FunctionalClass<T> - Class in com.bakdata.util
-
A wrapper around
Class
that can be used to extract callable lambdas to methods and fields. - FunctionalConstructor<T> - Class in com.bakdata.util
-
A lambda wrapper around the no-arg constructor of a class.
- FunctionalConstructor(Constructor<T>) - Constructor for class com.bakdata.util.FunctionalConstructor
- FunctionalMethod<T> - Class in com.bakdata.util
-
A lambda wrapper around the a method of a class.
- FunctionalMethod(Method) - Constructor for class com.bakdata.util.FunctionalMethod
- FunctionalProperty<T,F> - Class in com.bakdata.util
-
A lambda wrapper around the a property of a class.
- FunctionalProperty(PropertyDescriptor) - Constructor for class com.bakdata.util.FunctionalProperty
- fuse(Cluster<?, T>) - Method in interface com.bakdata.dedupe.fusion.Fusion
-
Fuses a cluster of duplicates to one representation.
- fuse(Cluster<?, R>) - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
- fusedValue(Cluster<?, T>, IncompleteFusionHandler<T>) - Method in interface com.bakdata.dedupe.fusion.Fusion
-
Returns the fused value for a cluster of duplicates.
- FusedValue<T> - Class in com.bakdata.dedupe.fusion
-
A fused value with contextual information.
- FusedValue(T, Cluster<?, T>, List<Exception>) - Constructor for class com.bakdata.dedupe.fusion.FusedValue
- FusingOnlineDuplicateDetection<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.deduplication.online
-
A full online deduplication process, which Retrieves duplicate clusters through
OnlineDeduplication
. Fuses the duplicate clusters into reconciled records. - FusingOnlineDuplicateDetection.FusingOnlineDuplicateDetectionBuilder<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.deduplication.online
- fusion(Fusion<T>) - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection.FusingOnlineDuplicateDetectionBuilder
- Fusion<T> - Interface in com.bakdata.dedupe.fusion
-
Fuses a cluster of duplicates to one representation.
- FusionContext - Class in com.bakdata.dedupe.fusion
-
A fusion context captures exceptions in an
ExceptionContext
during resolution and provides additional context values. - FusionContext.FusionContextBuilder - Class in com.bakdata.dedupe.fusion
- FusionException - Exception in com.bakdata.dedupe.fusion
-
An exception thrown by a
ConflictResolution
whenever an exception during fusion occurs. - FusionException() - Constructor for exception com.bakdata.dedupe.fusion.FusionException
- FusionException(String) - Constructor for exception com.bakdata.dedupe.fusion.FusionException
- FusionException(String, Throwable) - Constructor for exception com.bakdata.dedupe.fusion.FusionException
- FusionException(String, Throwable, boolean, boolean) - Constructor for exception com.bakdata.dedupe.fusion.FusionException
- FusionException(Throwable) - Constructor for exception com.bakdata.dedupe.fusion.FusionException
G
- get(int) - Method in class com.bakdata.dedupe.clustering.Cluster
- getAggregator() - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
-
The aggregator that will be applied on the similarity values.
- getAggregator() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
- getBipartiteMatcher() - Method in class com.bakdata.dedupe.similarity.MatchingSimilarity
- getCalculated() - Static method in class com.bakdata.dedupe.fusion.Source
-
A tag to indicate that the respective value has no real source as it has been created during conflict resolution.
- getCandidate() - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate
-
The classified candidate.
- getCandidateSelection() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
-
The candidate selection which returns a list of candidates for each new record.
- getClassification() - Method in class com.bakdata.dedupe.classifier.ClassificationResult
-
The classification.
- getClassificationResult() - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate
-
The resulting classification.
- getClassifier() - Method in class com.bakdata.dedupe.clustering.RefineCluster
-
The classifier used to score the edges.
- getClassifier() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
-
Classifier to label the candidates.
- getClazz() - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- getClazz() - Method in class com.bakdata.util.FunctionalClass
-
The wrapped class.
- getClosure() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
-
The underlying transitive closure implementation.
- getClusterIdGenerator() - Method in interface com.bakdata.dedupe.clustering.Clustering
-
The cluster id generator that is used to create an id for a new cluster.
- getClusterIdGenerator() - Method in class com.bakdata.dedupe.clustering.ConsistentClustering
- getClusterIdGenerator() - Method in class com.bakdata.dedupe.clustering.OracleClustering
- getClusterIdGenerator() - Method in class com.bakdata.dedupe.clustering.RefineCluster
-
A function to generate the id for newly split clusters.
- getClusterIdGenerator() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
- getClusterIdGenerator() - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
-
A function to generate the id for newly formed clusters.
- getClusterIndex() - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
-
A backing map for old clusters.
- getClustering() - Method in class com.bakdata.dedupe.clustering.ConsistentClustering
-
The wrapped clustering.
- getClustering() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
-
Clustering algorithm to form coherent clusters of labeled duplicates.
- getConfidence() - Method in class com.bakdata.dedupe.classifier.ClassificationResult
-
Some confidence value that depends on the
Classifier
implementation. - getConstructor() - Method in class com.bakdata.util.FunctionalClass
-
Returns the no-arg constructor as a
Supplier
. - getContainingCluster(Iterator<? extends Cluster<C, T>>, T) - Static method in class com.bakdata.dedupe.clustering.Clusters
-
Finds the cluster containing a given record and assures that there is exactly one.
- getContextSupplier() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
-
Factory that creates the
SimilarityContext
before classifying an incomingCandidate
. - getCtor() - Method in class com.bakdata.dedupe.fusion.Merge
- getCtor() - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- getCtor() - Method in class com.bakdata.util.FunctionalConstructor
-
The no-arg constructor
- getDateTime() - Method in class com.bakdata.dedupe.fusion.AnnotatedValue
-
The time of creation/modification.
- getDefaultClassificationResult() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
-
Fallback value, when no rule applied.
- getDefaultWindowSize() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod
-
The default window size, when not explicitly given.
- getDescriptor() - Method in class com.bakdata.util.FunctionalProperty
-
The wrapped property.
- getDuplicateDetection() - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
-
The duplicate detection returning duplicate clusters.
- getElements() - Method in class com.bakdata.dedupe.clustering.Cluster
-
The list of elements.
- getExceptionContext() - Method in class com.bakdata.dedupe.fusion.FusionContext
- getExceptionContext() - Method in class com.bakdata.dedupe.similarity.SimilarityContext
-
The exception context.
- getExceptions() - Method in class com.bakdata.dedupe.fusion.FusedValue
-
All exceptions that occurred during fusion.
- getExceptions() - Method in class com.bakdata.dedupe.fusion.FusionContext
- getExceptions() - Method in class com.bakdata.dedupe.similarity.SimilarityContext
- getExceptions() - Method in class com.bakdata.util.ExceptionContext
-
The captured exceptions.
- getExplanation() - Method in class com.bakdata.dedupe.classifier.ClassificationResult
-
Additional explanation for humans, such as the similarity and threshold (0.933 >= 0.9) or the name of a rule that triggered.
- getFieldMergeBuilder() - Method in class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- getFieldMerges() - Method in class com.bakdata.dedupe.fusion.Merge
- getFieldMerges() - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- getFirst() - Method in class com.bakdata.dedupe.candidate_selection.CompositeValue
-
The first element.
- getFirst() - Method in class com.bakdata.dedupe.matching.WeightedEdge
-
The first vertex.
- getFusion() - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
-
The fusion implementation that reconciles the clusters into new records.
- getGetter() - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- getGetter() - Method in class com.bakdata.util.FunctionalProperty
-
Returns the getter as a
Function
that takes the instance as a parameter. - getGoldClusters() - Method in class com.bakdata.dedupe.clustering.OracleClustering
-
The gold clustering.
- getGoldDuplicates() - Method in class com.bakdata.dedupe.classifier.OracleClassifier
-
The set of real duplicates.
- getId() - Method in class com.bakdata.dedupe.clustering.Cluster
-
The identifier of the cluster.
- getIdExtractor() - Method in class com.bakdata.dedupe.clustering.ConsistentClustering
-
A function to extract the id of a record for efficient, internal data structures.
- getIdExtractor() - Method in class com.bakdata.dedupe.clustering.OracleClustering
-
A function to extract the id of a record for efficient, internal data structures.
- getIdExtractor() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
-
Extracts the id of the record.
- getIdExtractor() - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
-
Extracts the id of the record.
- getIdExtractor() - Method in class com.bakdata.dedupe.similarity.CachingSimilarity
- getIncompleteFusionHandler() - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
-
A callback for non-trivial clusters.
- getInner() - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- getInner() - Method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
-
The similarity measure that is applied first before applying the threshold.
- getKeyExtractor() - Method in class com.bakdata.dedupe.candidate_selection.SortingKey
-
A calculation or simple value access to extract the key.
- getLastModifiedExtractor() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
-
A function that extract the last modification timestamp of a record.
- getMaxSmallClusterSize() - Method in class com.bakdata.dedupe.clustering.RefineCluster
-
The maximum size (inclusive) of a cluster.
- getMeasure() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.Rule
- getMeasure() - Method in class com.bakdata.dedupe.similarity.CachingSimilarity
- getMeasure() - Method in class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
-
The similarity measure to apply on the transformed values.
- getMeasure() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedSimilarity
- getMergeBuilder() - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- getMethod() - Method in class com.bakdata.util.FunctionalMethod
-
The wrapped method.
- getName() - Method in class com.bakdata.dedupe.candidate_selection.SortingKey
-
The name of the sorting key.
- getName() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.Rule
- getName() - Method in class com.bakdata.dedupe.fusion.ResolutionTag
-
The name of the tag for debugging.
- getName() - Method in class com.bakdata.dedupe.fusion.Source
-
The name of the source (mostly for debugging).
- getNewRecord() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
-
The new record that triggered the
OnlineCandidateSelection
. - getNextFreeMen() - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- getNonNullSimilarity(CharSequence, CharSequence, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.Levenshtein
- getNonNullSimilarity(C, C, SimilarityContext) - Method in interface com.bakdata.dedupe.similarity.CollectionSimilarityMeasure
- getNonNullSimilarity(R, R, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
- getNonNullSimilarity(T, T, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
- getNonNullSimilarity(T, T, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.CachingSimilarity
- getNonNullSimilarity(T, T, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
- getNonNullSimilarity(T, T, SimilarityContext) - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Calculates the similarity of the left and right value.
- getNonNullSimilarity(T, T, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
- getOldClusterIndex() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
-
A backing map for old clusters.
- getOldRecord() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
-
The old record already known to the
OnlineCandidateSelection
. - getOriginalValues() - Method in class com.bakdata.dedupe.fusion.FusedValue
-
The original values.
- getPairMeasure() - Method in class com.bakdata.dedupe.similarity.MatchingSimilarity
- getPasses() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod
-
The different passes used to select the candidates.
- getPossibleDuplicateHandler() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
-
A callback for
Classification.POSSIBLE_DUPLICATE
s. - getRecord1() - Method in interface com.bakdata.dedupe.candidate_selection.Candidate
-
Returns the first record.
- getRecord1() - Method in class com.bakdata.dedupe.candidate_selection.offline.OfflineCandidate
-
The first record.
- getRecord1() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
- getRecord2() - Method in interface com.bakdata.dedupe.candidate_selection.Candidate
-
Returns the second record.
- getRecord2() - Method in class com.bakdata.dedupe.candidate_selection.offline.OfflineCandidate
-
The second record.
- getRecord2() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
- getRefineCluster() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
-
The configured refineCluster.
- getResolution() - Method in class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- getRootResolution() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
-
The root resolution function; usually,
Merge
. - getRules() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
-
The set of rules that are applied in the order of addition.
- getSecond() - Method in class com.bakdata.dedupe.candidate_selection.CompositeValue
-
The first second.
- getSecond() - Method in class com.bakdata.dedupe.matching.WeightedEdge
-
The second vertex.
- getSetter() - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- getSetter() - Method in class com.bakdata.util.FunctionalProperty
-
Returns the setter as a
Function
that takes the instance and the new value as parameters. - getSimilarity(T, T, SimilarityContext) - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Calculates the similarity of the left and right value.
- getSimilarityForNull(T, T, SimilarityContext) - Method in class com.bakdata.dedupe.similarity.SimilarityContext
-
Calculates the similarity when any of the two values under comparison is null.
- getSimilarityMeasureForNull() - Method in class com.bakdata.dedupe.similarity.SimilarityContext
-
The similarity measure to use when any of the two values under comparison is null.
- getSimilarityMeasures() - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
-
The similarity measures that will successively applied on the input values.
- getSortingKey() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.Pass
-
The sorting key to use in this pass.
- getSource() - Method in class com.bakdata.dedupe.fusion.AnnotatedValue
-
The source of the value.
- getSourceExtractor() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
-
Finds the name of the source, which can then be used to retrieve the respective source from
ConflictResolutionFusion.getSources()
. - getSources() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
-
The list of possible sources.
- getSplitHandler() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
-
A callback that may veto cluster splits.
- getStableMatches() - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- getStableMatches() - Method in interface com.bakdata.dedupe.matching.AbstractStableMarriage.Matcher
- getStoredValues() - Method in class com.bakdata.dedupe.fusion.FusionContext
- getStrictSuccessors(Queue<List<Integer>>, Integer) - Static method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- getSuccessors(Queue<List<Integer>>, Integer) - Static method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- getTail(Queue<List<Integer>>, Integer) - Static method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- getThreshold() - Method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
-
The threshold that divides the dissimilar and the similar values.
- getThreshold() - Method in class com.bakdata.dedupe.similarity.Levenshtein
-
The threshold [0; 1], below which the calculation should be aborted.
- getTransformation() - Method in class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
-
The transformation that is applied to both inputs before calculating the similarity.
- getUnknown() - Static method in class com.bakdata.dedupe.fusion.Source
-
The unknown source is used whenever source extraction in
Fusion
failed. - getValue() - Method in class com.bakdata.dedupe.fusion.AnnotatedValue
-
The wrapped value.
- getValue() - Method in class com.bakdata.dedupe.fusion.FusedValue
-
The resulting value.
- getValue() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedValue
- getWeight() - Method in class com.bakdata.dedupe.fusion.Source
-
The weight of the source, mostly used for weighted majority voting.
- getWeight() - Method in class com.bakdata.dedupe.matching.WeightedEdge
-
The weight.
- getWeight() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedSimilarity
- getWeight() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedValue
- getWeightedSimilarities() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
- getWeightedValue() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedValue
- getWindowSize() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.Pass
-
The window
>= 2
.
H
- handlePartiallyFusedValue(FusedValue<T>) - Method in interface com.bakdata.dedupe.fusion.IncompleteFusionHandler
- hashCode() - Method in class com.bakdata.dedupe.candidate_selection.CompositeValue
- hashCode() - Method in class com.bakdata.dedupe.candidate_selection.offline.OfflineCandidate
- hashCode() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
- hashCode() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod
- hashCode() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.Pass
- hashCode() - Method in class com.bakdata.dedupe.candidate_selection.SortingKey
- hashCode() - Method in class com.bakdata.dedupe.classifier.ClassificationResult
- hashCode() - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate
- hashCode() - Method in class com.bakdata.dedupe.classifier.OracleClassifier
- hashCode() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
- hashCode() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.Rule
- hashCode() - Method in class com.bakdata.dedupe.clustering.Cluster
- hashCode() - Method in class com.bakdata.dedupe.clustering.ConsistentClustering
- hashCode() - Method in class com.bakdata.dedupe.clustering.OracleClustering
- hashCode() - Method in class com.bakdata.dedupe.clustering.RefineCluster
- hashCode() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
- hashCode() - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
- hashCode() - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
- hashCode() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
- hashCode() - Method in class com.bakdata.dedupe.fusion.AnnotatedValue
- hashCode() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
- hashCode() - Method in class com.bakdata.dedupe.fusion.FusedValue
- hashCode() - Method in class com.bakdata.dedupe.fusion.FusionContext
- hashCode() - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- hashCode() - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- hashCode() - Method in class com.bakdata.dedupe.fusion.Merge
- hashCode() - Method in class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- hashCode() - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- hashCode() - Method in class com.bakdata.dedupe.fusion.ResolutionTag
- hashCode() - Method in class com.bakdata.dedupe.fusion.Source
- hashCode() - Method in class com.bakdata.dedupe.matching.StronglyStableMarriage
- hashCode() - Method in class com.bakdata.dedupe.matching.WeaklyStableMarriage
- hashCode() - Method in class com.bakdata.dedupe.matching.WeightedEdge
- hashCode() - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
- hashCode() - Method in class com.bakdata.dedupe.similarity.CachingSimilarity
- hashCode() - Method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
- hashCode() - Method in class com.bakdata.dedupe.similarity.Levenshtein
- hashCode() - Method in class com.bakdata.dedupe.similarity.MatchingSimilarity
- hashCode() - Method in class com.bakdata.dedupe.similarity.SimilarityContext
- hashCode() - Method in class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
- hashCode() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
- hashCode() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedSimilarity
- hashCode() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedValue
- hashCode() - Method in class com.bakdata.util.ExceptionContext
- hashCode() - Method in class com.bakdata.util.FunctionalClass
- hashCode() - Method in class com.bakdata.util.FunctionalConstructor
- hashCode() - Method in class com.bakdata.util.FunctionalMethod
- hashCode() - Method in class com.bakdata.util.FunctionalProperty
I
- id(C) - Method in class com.bakdata.dedupe.clustering.Cluster.ClusterBuilder
- identity(SimilarityMeasure<T>) - Static method in class com.bakdata.dedupe.similarity.CachingSimilarity
-
Creates a cache based on the object identity; that is, the cache only triggers if the exact same instances are compared.
- idExtractor(Function<? super T, ? extends I>) - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure.RefinedTransitiveClosureBuilder
- idExtractor(Function<? super T, ? extends I>) - Method in class com.bakdata.dedupe.clustering.TransitiveClosure.TransitiveClosureBuilder
- idExtractor(Function<T, I>) - Method in class com.bakdata.dedupe.clustering.ConsistentClustering.ConsistentClusteringBuilder
- ignore() - Static method in interface com.bakdata.dedupe.clustering.ClusterSplitHandler
-
Do nothing.
- IllTypedFieldMergeBuilder(Merge.FieldMergeBuilder<F, R>, ConflictResolution<F, I>) - Constructor for class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- incompleteFusionHandler(IncompleteFusionHandler<T>) - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection.FusingOnlineDuplicateDetectionBuilder
- IncompleteFusionHandler<T> - Interface in com.bakdata.dedupe.fusion
-
A callback that allows incomplete fusions to be treated either with direct repair algorithms or through side-channel means (a.k.a ignore for now and repair asynchronously with the help of domain experts).
- inequality() - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
A similarity measure that is 0 if
left.equals(right)
or1
otherwise. - intGenerator() - Static method in class com.bakdata.dedupe.clustering.ClusterIdGenerators
-
Returns an id generator that generates ints starting from 0.
- invoke(T, Object...) - Method in class com.bakdata.util.FunctionalMethod
-
Invokes the method on an instance with the given parameters.
- isAmbiguous() - Method in enum com.bakdata.dedupe.classifier.Classification
-
Returns if this class can directly be used
- isNonEmpty(Object) - Method in class com.bakdata.dedupe.fusion.FusionContext
- isSymmetric() - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
- isSymmetric() - Method in class com.bakdata.dedupe.similarity.Levenshtein
- isSymmetric() - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Returns true if
sim(a, b) = sim(b, a)
. - isSymmetric() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
- isUnknown() - Method in enum com.bakdata.dedupe.classifier.Classification
- isUnknown(double) - Static method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Checks whether the supplied value is
SimilarityMeasure.unknown()
.
J
- jaccard() - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Treats the input collections as sets (removing duplicate elements) and calculates the size of the intersection over the size of the union.
- jaroWinkler() - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Jaro-Winkler similarity counts the number of matched and transposed characters with a boost for initial characters.
K
- keep() - Static method in interface com.bakdata.dedupe.duplicate_detection.PossibleDuplicateHandler
-
Ignores possible duplicates and returns them as is, which will most likely result in dropping them.
L
- last() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Picks a the last value out of the bag of values.
- last(Iterable<? extends SimilarityMeasure<? super T>>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the last similarity of the given similarity measures that is not
SimilarityMeasure.unknown()
. - last(SimilarityMeasure<? super T>...) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the last similarity of the given similarity measures that is not
SimilarityMeasure.unknown()
. - lastModifiedExtractor(Function<R, LocalDateTime>) - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- latest() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects all values with the same latest
AnnotatedValue.getDateTime()
. - levenshtein() - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the Levenshtein similarity measure.
- Levenshtein<T extends java.lang.CharSequence> - Class in com.bakdata.dedupe.similarity
-
Provides the Levenshtein similarity calculation, which calculates the number of insertions, deletions, and replacements needed to transform one string into another.
- Levenshtein(double) - Constructor for class com.bakdata.dedupe.similarity.Levenshtein
- longest() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects all values with the same longest length.
- longGenerator() - Static method in class com.bakdata.dedupe.clustering.ClusterIdGenerators
-
Returns an id generator that generates longs starting from 0.
M
- match() - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- match(Collection<WeightedEdge<T>>, Collection<WeightedEdge<T>>) - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage
- match(Collection<WeightedEdge<T>>, Collection<WeightedEdge<T>>) - Method in interface com.bakdata.dedupe.matching.BipartiteMatcher
-
Finds the set of edges forming a matching that maximizes the sum of the edge weights.
- matching(BipartiteMatcher<E>, SimilarityMeasure<? super E>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Uses the given pair
SimilarityMeasure
to calculate a preference matrix and uses theBipartiteMatcher
to find the best matching entries. - MatchingSimilarity<C extends java.util.Collection<? extends E>,E> - Class in com.bakdata.dedupe.similarity
-
A similarity measure that finds the best matches between two bags of entities and calculates an overall similarity by summing the similarity of these matches and normalize it over the (max) number of elements per collection.
- MatchingSimilarity(BipartiteMatcher<E>, SimilarityMeasure<E>) - Constructor for class com.bakdata.dedupe.similarity.MatchingSimilarity
- matchMaterialized(Collection<WeightedEdge<T>>, Collection<WeightedEdge<T>>) - Method in interface com.bakdata.dedupe.matching.BipartiteMatcher
-
Finds the set of edges forming a matching that maximizes the sum of the edge weights.
- materializedDeduplicate(Iterable<? extends T>) - Method in interface com.bakdata.dedupe.deduplication.Deduplication
-
Deduplicates the dataset.
- materializedDeduplicate(Iterable<? extends T>, Function<? super T, Object>) - Method in interface com.bakdata.dedupe.deduplication.Deduplication
-
Selects the candidates for the given records and materializes them.
- materializeDuplicates(Iterable<? extends T>) - Method in interface com.bakdata.dedupe.duplicate_detection.DuplicateDetection
-
Finds all duplicates in the dataset.
- max() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects all values that a maximal.
- max(SimilarityMeasure<? super T>...) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the largest similarity of the given similarity measures.
- maxSmallClusterSize(int) - Method in class com.bakdata.dedupe.clustering.RefineCluster.RefineClusterBuilder
- mean() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Calculates the mean value of a bag of conflicting numbers.
- mean(SimilarityMeasure<? super T>...) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the mean similarity of the given similarity measures.
- median() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Calculates the median value of a bag of conflicting numbers.
- mensFavoriteWomen - Variable in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- merge(Function<? super Iterable<? extends T>, ? extends C>, Cluster<C, ? extends T>) - Method in class com.bakdata.dedupe.clustering.Cluster
-
Merges this cluster with another cluster into one new cluster.
- merge(Supplier<T>) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Starts the creation of a
Merge
conflict resolution for record types. - merge(Class<T>) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Starts the creation of a
Merge
conflict resolution for record types. - Merge<R> - Class in com.bakdata.dedupe.fusion
-
A nested conflict resolution for complex types.
- Merge.AdditionalFieldMergeBuilder<F,R> - Class in com.bakdata.dedupe.fusion
- Merge.FieldMergeBuilder<F,R> - Class in com.bakdata.dedupe.fusion
- Merge.IllTypedFieldMergeBuilder<I,F,R> - Class in com.bakdata.dedupe.fusion
- Merge.MergeBuilder<R> - Class in com.bakdata.dedupe.fusion
- MergeBuilder(Supplier<R>, FunctionalClass<R>) - Constructor for class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- min() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects all values that a minimal.
- min(SimilarityMeasure<? super T>...) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Returns the smallest similarity of the given similarity measures.
- mongeElkan(SimilarityMeasure<E>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Monge-Elkan is a simple list-based similarity measure, where elements from the left are matched with elements from the right with the highest similarity within a certain index range.
- mongeElkan(SimilarityMeasure<E>, int) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Monge-Elkan is a simple list-based similarity measure, where elements from the left are matched with elements from the right with the highest similarity within a certain index range.
- mostFrequent() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects the most frequent values.
N
- negate() - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Swaps the lower and upper bound, such that equal pairs have a similarity of 0 and unequal pairs of 1.
- negate(SimilarityMeasure<T>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Swaps the lower and upper bound, such that equal pairs have a similarity of 0 and unequal pairs of 1.
- negativeRule(String, SimilarityMeasure<? super T>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Creates a negative rule that is always applied.
- negativeRule(String, BiPredicate<T, T>, SimilarityMeasure<? super T>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Creates a negative rule that is applied only when the precondition holds.
- newInstance() - Method in class com.bakdata.util.FunctionalConstructor
-
Creates a new instance.
- ngram(int) - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Creates a tokenizer for a string into ngrams; that is, two or more succeeding characters.
- NON_DUPLICATE - com.bakdata.dedupe.classifier.Classification
-
A sure non-duplicate.
O
- of(ValueTransformation<O, ? extends T>) - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Applies a
ValueTransformation
to the left and right value of a similarity comparison before applying this . - of(Class<T>) - Static method in class com.bakdata.util.FunctionalClass
- of(Function<O, ? extends T>) - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Applies a
ValueTransformation
to the left and right value of a similarity comparison before applying this . - of(Class<T>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Fluent cast of the type parameter.
- of(Class<T>) - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure.AggregatingSimilarityMeasureBuilder
-
Fluent cast of the type parameter.
- of(Class<T>) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
-
Fluent cast of the type parameter.
- of(T1, T2) - Static method in class com.bakdata.dedupe.candidate_selection.CompositeValue
-
Creates a composite value of the two values.
- OfflineCandidate<T> - Class in com.bakdata.dedupe.candidate_selection.offline
-
Represents a candidate pair that was generated with an
OfflineCandidateSelection
. - OfflineCandidate(T, T) - Constructor for class com.bakdata.dedupe.candidate_selection.offline.OfflineCandidate
- OfflineCandidateSelection<T> - Interface in com.bakdata.dedupe.candidate_selection.offline
-
Selects candidates from a static dataset accessible through Iterables.
- OfflineDeduplication<T> - Interface in com.bakdata.dedupe.deduplication.offline
-
A full offline deduplication process, which ensures that no duplicate record is in the results.
- OfflineDuplicateDetection<C extends java.lang.Comparable<C>,T> - Interface in com.bakdata.dedupe.duplicate_detection.offline
-
The offline duplicate detection returns all duplicate records within a dataset.
- oldClusterIndex(Map<I, Cluster<C, T>>) - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure.RefinedTransitiveClosureBuilder
- OnlineCandidate<T> - Class in com.bakdata.dedupe.candidate_selection.online
-
Represents a candidate pair that was generated with an
OnlineCandidateSelection
. - OnlineCandidate(T, T) - Constructor for class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
- OnlineCandidateSelection<T> - Interface in com.bakdata.dedupe.candidate_selection.online
-
Selects candidates from a streaming dataset by processing the incoming elements one-at-a-time.
- OnlineDeduplication<T> - Interface in com.bakdata.dedupe.deduplication.online
-
A full online deduplication process, which ensures that no duplicate record is emitted.
- OnlineDuplicateDetection<C extends java.lang.Comparable<C>,T> - Interface in com.bakdata.dedupe.duplicate_detection.online
-
An online duplicate detection algorithm processes a stream of records and returns all changed
Cluster
s for each record. - OnlinePairBasedDuplicateDetection<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.duplicate_detection.online
-
A pair-based duplicate detection algorithm, which Performs
OnlineCandidateSelection
Applies aClassifier
to the foundCandidate
s Transforms the found duplicate pairs with aClustering
intoClusters
s - OnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.duplicate_detection.online
- OnlineSortedNeighborhoodMethod<T> - Class in com.bakdata.dedupe.candidate_selection.online
-
A sorted neighborhood method (SNM) for online deduplication.
- OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder<T> - Class in com.bakdata.dedupe.candidate_selection.online
- OnlineSortedNeighborhoodMethod.Pass<T,K extends java.lang.Comparable<K>> - Class in com.bakdata.dedupe.candidate_selection.online
-
Represents a pass over the dataset with a specific sorting key and window size.
- OracleClassifier<T> - Class in com.bakdata.dedupe.classifier
-
A classifier that knows the results perfectly as it receives the gold standard during creation.
- OracleClassifier(Set<Candidate<T>>) - Constructor for class com.bakdata.dedupe.classifier.OracleClassifier
- OracleClustering<C extends java.lang.Comparable<C>,T,I> - Class in com.bakdata.dedupe.clustering
-
A clustering that knows the results perfectly as it receives the gold standard during creation.
- OracleClustering(Collection<Cluster<C, T>>, Function<T, I>) - Constructor for class com.bakdata.dedupe.clustering.OracleClustering
P
- pass(OnlineSortedNeighborhoodMethod.Pass<T, ?>) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
- Pass(SortingKey<? super T, ? extends K>, int) - Constructor for class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.Pass
-
Creates a pass with the given sorting key and window size.
- passes(Collection<? extends OnlineSortedNeighborhoodMethod.Pass<T, ?>>) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
- positionWise(SimilarityMeasure<E>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Performs a simple one to one comparison of elements in two lists based on their index.
- positiveRule(String, SimilarityMeasure<? super T>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Creates a positive rule that is always applied.
- positiveRule(String, BiPredicate<T, T>, SimilarityMeasure<? super T>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Creates a positive rule that is applied only when the precondition holds.
- POSSIBLE_DUPLICATE - com.bakdata.dedupe.classifier.Classification
-
Possible duplicate may come from a high uncertainty of the
Classifier
. - possibleDuplicateFound(ClassifiedCandidate<T>) - Method in interface com.bakdata.dedupe.duplicate_detection.PossibleDuplicateHandler
-
Invoked when a
Classification.POSSIBLE_DUPLICATE
has been founds duringDuplicateDetection
. - possibleDuplicateHandler(PossibleDuplicateHandler<T>) - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder
- PossibleDuplicateHandler<T> - Interface in com.bakdata.dedupe.duplicate_detection
-
A callback that is invoked when a
Classification.POSSIBLE_DUPLICATE
has been founds duringDuplicateDetection
. - preferSource(List<Source>) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Chooses all values from a specific source.
- preferSource(Source...) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Chooses all values from a specific source.
- promoteToDuplicate() - Static method in interface com.bakdata.dedupe.duplicate_detection.PossibleDuplicateHandler
-
Treats possible duplicates as full duplicates.
- propose(Integer, Integer) - Method in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
R
- random() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Picks a random value out of the bag of values.
- refine(Stream<? extends Cluster<C, T>>, Stream<ClassifiedCandidate<T>>) - Method in class com.bakdata.dedupe.clustering.RefineCluster
- refineCluster(RefineCluster<C, T>) - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure.RefinedTransitiveClosureBuilder
- RefineCluster<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.clustering
-
Splits large clusters into smaller clusters when the inter-cluster similarities are sub-optimal.
- RefineCluster.RefineClusterBuilder<C extends java.lang.Comparable<C>,T> - Class in com.bakdata.dedupe.clustering
- refinedSoundex(@lombok.NonNull char[]) - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Creates a normalizer that turns strings into the refined soundex representation.
- RefinedTransitiveClosure<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> - Class in com.bakdata.dedupe.clustering
-
Executes
TransitiveClosure
and successivelyRefineCluster
. - RefinedTransitiveClosure.RefinedTransitiveClosureBuilder<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> - Class in com.bakdata.dedupe.clustering
- removeCluster(Cluster<C, ? extends T>) - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
- ResolutionTag<T> - Class in com.bakdata.dedupe.fusion
-
A resolution tag is a way to refer to a previously, resolved value in the current
FusionContext
. - ResolutionTag(String) - Constructor for class com.bakdata.dedupe.fusion.ResolutionTag
- resolve(List<AnnotatedValue<I>>, FusionContext) - Method in interface com.bakdata.dedupe.fusion.ConflictResolution
-
Fully resolves the values if possible or throws a
FusionException
. - resolveFully(List<AnnotatedValue<I>>, FusionContext) - Method in interface com.bakdata.dedupe.fusion.TerminalConflictResolution
- resolveNonEmptyPartially(List<AnnotatedValue<I>>, FusionContext) - Method in interface com.bakdata.dedupe.fusion.ConflictResolution
-
Tries the best to resolve the value but may end up with multiple concurrent values.
- resolveNonEmptyPartially(List<AnnotatedValue<I>>, FusionContext) - Method in interface com.bakdata.dedupe.fusion.TerminalConflictResolution
- resolveNonEmptyPartially(List<AnnotatedValue<R>>, FusionContext) - Method in class com.bakdata.dedupe.fusion.Merge
- resolvePartially(List<AnnotatedValue<I>>, FusionContext) - Method in interface com.bakdata.dedupe.fusion.ConflictResolution
-
Tries the best to resolve the value but may end up with multiple concurrent values.
- retrieveValues(ResolutionTag<T>) - Method in class com.bakdata.dedupe.fusion.FusionContext
- reversed() - Method in class com.bakdata.dedupe.matching.WeightedEdge
-
Reverses the direction of the edge by swapping the vertexes.
- rootResolution(ConflictResolution<R, R>) - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- rule(RuleBasedClassifier.Rule<T>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
- Rule(String, SimilarityMeasure<? super T>) - Constructor for class com.bakdata.dedupe.classifier.RuleBasedClassifier.Rule
- RuleBasedClassifier<T> - Class in com.bakdata.dedupe.classifier
-
Successively applies a list of rules to the record and returns the respective
ClassificationResult
with the following cases: If any rule classifies the pair unambiguously asClassification.DUPLICATE
orClassification.NON_DUPLICATE
, the classification is immediately returned. If no rule can be applied, the classification isRuleBasedClassifier.defaultClassificationResult
. There are three types of rules: Negatives rule are used to exclude false positives. - RuleBasedClassifier.Rule<T> - Class in com.bakdata.dedupe.classifier
-
A rule has a name for lineage/debugging and the similarity measure.
- RuleBasedClassifier.RuleBasedClassifierBuilder<T> - Class in com.bakdata.dedupe.classifier
-
A builder for
RuleBasedClassifier
with convenience methods to create positive and negative rules. - rules(Collection<? extends RuleBasedClassifier.Rule<T>>) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
S
- safeExecute(Runnable) - Method in class com.bakdata.util.ExceptionContext
-
Safely executes the given runnable.
- safeExecute(Callable<? extends T>) - Method in class com.bakdata.util.ExceptionContext
-
Safely executes the given function.
- safeExecute(Runnable) - Method in class com.bakdata.dedupe.fusion.FusionContext
- safeExecute(Runnable) - Method in class com.bakdata.dedupe.similarity.SimilarityContext
- safeExecute(Callable<? extends T>) - Method in class com.bakdata.dedupe.fusion.FusionContext
- safeExecute(Callable<? extends T>) - Method in class com.bakdata.dedupe.similarity.SimilarityContext
- saveAs(ConflictResolution<T, T>, ResolutionTag<T>) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Creates a
ResolutionTag
for the currently resolved value. - scaledDifference(int, TemporalUnit) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Calculates the difference between the left and right
Temporal
and compares the absolute difference tomaxDiff
. - scaledDifference(T) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Calculates the difference between the left and right number and compares the absolute difference to
maxDiff
. - scaleWithThreshold(double) - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Scales the similarity returned by this similarity, such that all values
≤minExclusive
result in a similarity of 0, and all values(minExclusive, 1]
are linearly rescaled to(0, 1]
. - selectCandidates(Iterable<? extends T>) - Method in interface com.bakdata.dedupe.candidate_selection.CandidateSelection
-
Selects the candidates for the given records and materializes them.
- selectCandidates(Stream<? extends T>) - Method in interface com.bakdata.dedupe.candidate_selection.CandidateSelection
-
Selects the candidates for the given records.
- selectCandidates(Stream<? extends T>) - Method in interface com.bakdata.dedupe.candidate_selection.online.OnlineCandidateSelection
- selectCandidates(T) - Method in interface com.bakdata.dedupe.candidate_selection.online.OnlineCandidateSelection
-
Selects the candidates for the a new incoming record.
- selectCandidates(T) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod
- setElements(List<T>) - Method in class com.bakdata.dedupe.clustering.Cluster
-
The list of elements.
- setId(C) - Method in class com.bakdata.dedupe.clustering.Cluster
-
The identifier of the cluster.
- SetSimilarityMeasure<C extends java.util.Collection<? extends E>,E> - Interface in com.bakdata.dedupe.similarity
-
A
SimilarityMeasure
that is defined overCollection
s that are treated as sets. - shortest() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects all values with the same shortest length.
- SimilarityContext - Class in com.bakdata.dedupe.similarity
-
A similarity context captures exceptions in an
ExceptionContext
and provides additional configurations that represent cross-cutting concerns, such as null handling. - SimilarityContext.SimilarityContextBuilder - Class in com.bakdata.dedupe.similarity
- SimilarityMeasure<T> - Interface in com.bakdata.dedupe.similarity
-
A SimilarityMeasure compares two values and calculates a similarity in [-1; 1], where 0 means no similarity and 1 equal values in a given context.
- similarityMeasureForNull(SimilarityMeasure<Object>) - Method in class com.bakdata.dedupe.similarity.SimilarityContext.SimilarityContextBuilder
- similarityMeasures(List<SimilarityMeasure<? super T>>) - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure.AggregatingSimilarityMeasureBuilder
- similarityScore(SimilarityScore<R>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Wraps a
SimilarityScore
of apache commons-text into aSimilarityMeasure
. - size() - Method in class com.bakdata.dedupe.clustering.Cluster
- sortingKey(SortingKey<T, ?>) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
-
Adds a new pass with the given sorting key and the
OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder.defaultWindowSize(int)
. - sortingKey(SortingKey<T, ?>, int) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
-
Adds a new pass with the given sorting key.
- SortingKey<T,K extends java.lang.Comparable<K>> - Class in com.bakdata.dedupe.candidate_selection
-
A sorting key allows a dataset to be indexed by a specific (calculated) value of a record, such that duplicates have a higher probability of being in close proximity and thus a
CandidateSelection
may prune the search space drastically. - SortingKey(String, Function<T, K>) - Constructor for class com.bakdata.dedupe.candidate_selection.SortingKey
- sortingKeys(Iterable<SortingKey<T, ?>>) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
-
Adds new passes with the given list of sorting keys and the
OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder.defaultWindowSize(int)
. - sortingKeys(Iterable<SortingKey<T, ?>>, int) - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
-
Adds new passes with the given list of sorting keys and the given window size.
- soundex() - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Creates a normalizer that turns strings into the soundex representation.
- source(Source) - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- Source - Class in com.bakdata.dedupe.fusion
-
The source of a value to be fused.
- Source(String, double) - Constructor for class com.bakdata.dedupe.fusion.Source
- sourceExtractor(Function<R, String>) - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- sources(Collection<? extends Source>) - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- splitHandler(ClusterSplitHandler) - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure.RefinedTransitiveClosureBuilder
- stableMatching(SimilarityMeasure<E>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Will find the (weakly) stable matches between the left and the right side with a given pair similarity measure and calculate the average similarity between the stable pairs.
- storeValues(ResolutionTag<T>, List<AnnotatedValue<T>>) - Method in class com.bakdata.dedupe.fusion.FusionContext
- stream(Iterable<T>) - Static method in class com.bakdata.util.StreamUtil
- StreamUtil - Class in com.bakdata.util
-
Adds some functions that are missing in Java Streams.
- stringGenerator(String) - Static method in class com.bakdata.dedupe.clustering.ClusterIdGenerators
-
Returns an id generator that generates strings with a given prefix starting from 0.
- StronglyStableMarriage<T> - Class in com.bakdata.dedupe.matching
-
Implements a strongly stable matching based on the stable marriage with indifference.
- StronglyStableMarriage() - Constructor for class com.bakdata.dedupe.matching.StronglyStableMarriage
- sum() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Calculates the sum of a bag of conflicting numbers.
T
- takeWhileInclusive(DoubleStream, DoublePredicate) - Static method in class com.bakdata.util.StreamUtil
- takeWhileInclusive(IntStream, IntPredicate) - Static method in class com.bakdata.util.StreamUtil
- takeWhileInclusive(LongStream, LongPredicate) - Static method in class com.bakdata.util.StreamUtil
- takeWhileInclusive(Stream<T>, Predicate<? super T>) - Static method in class com.bakdata.util.StreamUtil
- TerminalConflictResolution<I,O> - Interface in com.bakdata.dedupe.fusion
-
A conflict resolution function that is guaranteed to produce a single value.
- then(ConflictResolution<F, F>) - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- then(ConflictResolution<I, J>) - Method in class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- thresholdRule(String, SimilarityMeasure<? super T>, double) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Creates a threshold rule that is always applied.
- thresholdRule(String, BiPredicate<T, T>, SimilarityMeasure<? super T>, double) - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
-
Creates a threshold rule that is applied only when the precondition holds.
- toBuilder() - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate
- toSimilarity(EditDistance<R>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Used to translate
EditDistance
of apache commons-text into a similarity score by using the formula1 - dist / maxDist
where maxDist is the maximum length of the input strings. - toString() - Method in class com.bakdata.dedupe.candidate_selection.CompositeValue
- toString() - Method in class com.bakdata.dedupe.candidate_selection.offline.OfflineCandidate
- toString() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineCandidate
- toString() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod.OnlineSortedNeighborhoodMethodBuilder
- toString() - Method in class com.bakdata.dedupe.candidate_selection.online.OnlineSortedNeighborhoodMethod
- toString() - Method in class com.bakdata.dedupe.candidate_selection.SortingKey
- toString() - Method in class com.bakdata.dedupe.classifier.ClassificationResult.ClassificationResultBuilder
- toString() - Method in class com.bakdata.dedupe.classifier.ClassificationResult
- toString() - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate.ClassifiedCandidateBuilder
- toString() - Method in class com.bakdata.dedupe.classifier.ClassifiedCandidate
- toString() - Method in class com.bakdata.dedupe.classifier.OracleClassifier
- toString() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.Rule
- toString() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier.RuleBasedClassifierBuilder
- toString() - Method in class com.bakdata.dedupe.classifier.RuleBasedClassifier
- toString() - Method in class com.bakdata.dedupe.clustering.Cluster.ClusterBuilder
- toString() - Method in class com.bakdata.dedupe.clustering.Cluster
- toString() - Method in class com.bakdata.dedupe.clustering.ConsistentClustering.ConsistentClusteringBuilder
- toString() - Method in class com.bakdata.dedupe.clustering.ConsistentClustering
- toString() - Method in class com.bakdata.dedupe.clustering.OracleClustering
- toString() - Method in class com.bakdata.dedupe.clustering.RefineCluster.RefineClusterBuilder
- toString() - Method in class com.bakdata.dedupe.clustering.RefineCluster
- toString() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure.RefinedTransitiveClosureBuilder
- toString() - Method in class com.bakdata.dedupe.clustering.RefinedTransitiveClosure
- toString() - Method in class com.bakdata.dedupe.clustering.TransitiveClosure
- toString() - Method in class com.bakdata.dedupe.clustering.TransitiveClosure.TransitiveClosureBuilder
- toString() - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection.FusingOnlineDuplicateDetectionBuilder
- toString() - Method in class com.bakdata.dedupe.deduplication.online.FusingOnlineDuplicateDetection
- toString() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection.OnlinePairBasedDuplicateDetectionBuilder
- toString() - Method in class com.bakdata.dedupe.duplicate_detection.online.OnlinePairBasedDuplicateDetection
- toString() - Method in class com.bakdata.dedupe.fusion.AnnotatedValue
- toString() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion.ConflictResolutionFusionBuilder
- toString() - Method in class com.bakdata.dedupe.fusion.ConflictResolutionFusion
- toString() - Method in class com.bakdata.dedupe.fusion.FusedValue
- toString() - Method in class com.bakdata.dedupe.fusion.FusionContext.FusionContextBuilder
- toString() - Method in class com.bakdata.dedupe.fusion.FusionContext
- toString() - Method in class com.bakdata.dedupe.fusion.Merge.AdditionalFieldMergeBuilder
- toString() - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- toString() - Method in class com.bakdata.dedupe.fusion.Merge.IllTypedFieldMergeBuilder
- toString() - Method in class com.bakdata.dedupe.fusion.Merge.MergeBuilder
- toString() - Method in class com.bakdata.dedupe.fusion.Merge
- toString() - Method in class com.bakdata.dedupe.fusion.ResolutionTag
- toString() - Method in class com.bakdata.dedupe.fusion.Source
- toString() - Method in class com.bakdata.dedupe.matching.StronglyStableMarriage
- toString() - Method in class com.bakdata.dedupe.matching.WeaklyStableMarriage
- toString() - Method in class com.bakdata.dedupe.matching.WeightedEdge
- toString() - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure.AggregatingSimilarityMeasureBuilder
- toString() - Method in class com.bakdata.dedupe.similarity.AggregatingSimilarityMeasure
- toString() - Method in class com.bakdata.dedupe.similarity.CachingSimilarity
- toString() - Method in class com.bakdata.dedupe.similarity.CutoffSimiliarityMeasure
- toString() - Method in class com.bakdata.dedupe.similarity.Levenshtein
- toString() - Method in class com.bakdata.dedupe.similarity.MatchingSimilarity
- toString() - Method in class com.bakdata.dedupe.similarity.SimilarityContext.SimilarityContextBuilder
- toString() - Method in class com.bakdata.dedupe.similarity.SimilarityContext
- toString() - Method in class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
- toString() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure
- toString() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- toString() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedSimilarity
- toString() - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedValue
- toString() - Method in class com.bakdata.util.ExceptionContext
- toString() - Method in class com.bakdata.util.FunctionalClass
- toString() - Method in class com.bakdata.util.FunctionalConstructor
- toString() - Method in class com.bakdata.util.FunctionalMethod
- toString() - Method in class com.bakdata.util.FunctionalProperty
- transform(Function<? super T, ? extends R>) - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Wraps any function into a
ValueTransformation
. - transform(Function<? super T, R>) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Transforms the values within an
AnnotatedValue
. - transform(T, SimilarityContext) - Method in interface com.bakdata.dedupe.similarity.ValueTransformation
-
Transforms the value to the expected output type.
- TransformingSimilarityMeasure<R,T> - Class in com.bakdata.dedupe.similarity
-
Applies a
SimilarityMeasure
to transformed input values. - TransformingSimilarityMeasure(ValueTransformation<T, ? extends R>, SimilarityMeasure<R>) - Constructor for class com.bakdata.dedupe.similarity.TransformingSimilarityMeasure
- TransitiveClosure<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> - Class in com.bakdata.dedupe.clustering
-
An amortized linear transitive closure implementation over the number of pairs.
- TransitiveClosure.TransitiveClosureBuilder<C extends java.lang.Comparable<C>,T,I extends java.lang.Comparable<? super I>> - Class in com.bakdata.dedupe.clustering
- trigram() - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Creates a tokenizer for a string into trigrams; that is, three succeeding characters.
U
- union() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Retains all distinct values.
- unionAll() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Retains all values.
- unionAll(Supplier<? extends R>) - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Adds all values to the given collection type.
- unknown() - Static method in interface com.bakdata.dedupe.duplicate_detection.PossibleDuplicateHandler
-
Treats possible duplicates as unknown classification.
- unknown() - Static method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Returns a value indicating that the similarity value is unknown.
- UNKNOWN - com.bakdata.dedupe.classifier.Classification
-
Unknown classifications are primarily caused by a lack of information.
- unknownIf(DoublePredicate) - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Replaces the similarity returned by this similarity, such that all values, for which the given predicate evaluates to true, result in an
SimilarityMeasure.unknown()
similarity. - unknownIfZero() - Method in interface com.bakdata.dedupe.similarity.SimilarityMeasure
-
Replaces the similarity returned by this similarity, such that all values
=0
result in anSimilarityMeasure.unknown()
similarity.
V
- valueOf(String) - Static method in enum com.bakdata.dedupe.classifier.Classification
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum com.bakdata.dedupe.classifier.Classification
-
Returns an array containing the constants of this enum type, in the order they are declared.
- ValueTransformation<T,R> - Interface in com.bakdata.dedupe.similarity
-
Performs a transformation on a value, especially on input values for
SimilarityMeasure
. - vote() - Static method in class com.bakdata.dedupe.fusion.CommonConflictResolutions
-
Selects the highest weighted values.
W
- WeaklyStableMarriage<T> - Class in com.bakdata.dedupe.matching
-
Implements a weakly stable matching based on the stable marriage with indifference (i.e., ties).
- WeaklyStableMarriage() - Constructor for class com.bakdata.dedupe.matching.WeaklyStableMarriage
- WeightedAggregatingSimilarityMeasure<R> - Class in com.bakdata.dedupe.similarity
- WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder<R> - Class in com.bakdata.dedupe.similarity
- WeightedAggregatingSimilarityMeasure.WeightedSimilarity<T> - Class in com.bakdata.dedupe.similarity
- WeightedAggregatingSimilarityMeasure.WeightedValue - Class in com.bakdata.dedupe.similarity
- weightedAggregation(ToDoubleFunction<Stream<WeightedAggregatingSimilarityMeasure.WeightedValue>>) - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Starts the creation of a weighted aggregation of multiple
SimilarityMeasure
s. - weightedAverage() - Static method in class com.bakdata.dedupe.similarity.CommonSimilarityMeasures
-
Starts the creation of a weighted average of multiple
SimilarityMeasure
s. - WeightedEdge<T> - Class in com.bakdata.dedupe.matching
-
A weighted (directed) edge between two records.
- WeightedEdge(T, T, double) - Constructor for class com.bakdata.dedupe.matching.WeightedEdge
- weightedSimilarities(Collection<? extends WeightedAggregatingSimilarityMeasure.WeightedSimilarity<R>>) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- weightedSimilarity(WeightedAggregatingSimilarityMeasure.WeightedSimilarity<R>) - Method in class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedAggregatingSimilarityMeasureBuilder
- WeightedSimilarity(double, SimilarityMeasure<T>) - Constructor for class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedSimilarity
- WeightedValue(double, double) - Constructor for class com.bakdata.dedupe.similarity.WeightedAggregatingSimilarityMeasure.WeightedValue
- with(ConflictResolution<F, F>) - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- with(ConflictResolution<F, F>, ConflictResolution<F, F>...) - Method in class com.bakdata.dedupe.fusion.Merge.FieldMergeBuilder
- withValue(S) - Method in class com.bakdata.dedupe.fusion.AnnotatedValue
-
Creates a new instance with changed value and with potentially different type.
- womensFavoriteMen - Variable in class com.bakdata.dedupe.matching.AbstractStableMarriage.AbstractMatcher
- words() - Static method in class com.bakdata.dedupe.similarity.CommonTransformations
-
Splits a string on all whitespace characters into words.
All Classes All Packages