public static class Unique.FilterPartialDuplicates extends BaseOperation<java.util.LinkedHashMap<Tuple,java.lang.Object>> implements Filter<java.util.LinkedHashMap<Tuple,java.lang.Object>>
Filter
that is used to remove observed duplicates from the tuple stream.
Use this class typically in tandem with a First
Aggregator
in order to improve de-duping performance by removing as many values
as possible before the intermediate GroupBy
operator.
The threshold
value is used to maintain a LRU of a constant size. If more than threshold unique values
are seen, the oldest cached values will be removed from the cache.Unique
,
Serialized FormfieldDeclaration, numArgs, trace
Constructor and Description |
---|
Unique.FilterPartialDuplicates()
Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.
|
Unique.FilterPartialDuplicates(int threshold)
Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.
|
Unique.FilterPartialDuplicates(Unique.Include include,
int threshold)
Constructor FilterPartialDuplicates creates a new FilterPartialDuplicates instance.
|
Modifier and Type | Method and Description |
---|---|
void |
cleanup(FlowProcess flowProcess,
OperationCall<java.util.LinkedHashMap<Tuple,java.lang.Object>> operationCall)
Method cleanup does nothing, and may safely be overridden.
|
boolean |
equals(java.lang.Object object) |
int |
hashCode() |
boolean |
isRemove(FlowProcess flowProcess,
FilterCall<java.util.LinkedHashMap<Tuple,java.lang.Object>> filterCall)
Method isRemove returns true if input should be removed from the tuple stream.
|
void |
prepare(FlowProcess flowProcess,
OperationCall<java.util.LinkedHashMap<Tuple,java.lang.Object>> operationCall)
Method prepare does nothing, and may safely be overridden.
|
flush, getFieldDeclaration, getNumArgs, getTrace, isSafe, printOperationInternal, toString, toStringInternal
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
flush, getFieldDeclaration, getNumArgs, isSafe
public Unique.FilterPartialDuplicates()
@ConstructorProperties(value="threshold") public Unique.FilterPartialDuplicates(int threshold)
threshold
- of type int@ConstructorProperties(value={"include","threshold"}) public Unique.FilterPartialDuplicates(Unique.Include include, int threshold)
threshold
- of type intpublic void prepare(FlowProcess flowProcess, OperationCall<java.util.LinkedHashMap<Tuple,java.lang.Object>> operationCall)
BaseOperation
public boolean isRemove(FlowProcess flowProcess, FilterCall<java.util.LinkedHashMap<Tuple,java.lang.Object>> filterCall)
Filter
public void cleanup(FlowProcess flowProcess, OperationCall<java.util.LinkedHashMap<Tuple,java.lang.Object>> operationCall)
BaseOperation
public boolean equals(java.lang.Object object)
equals
in class BaseOperation<java.util.LinkedHashMap<Tuple,java.lang.Object>>
public int hashCode()
hashCode
in class BaseOperation<java.util.LinkedHashMap<Tuple,java.lang.Object>>