Interface TupleMapFactory allows developers to plugin alternative implementations of a "tuple map"
used to back in memory "join" and "co-group" operations. Typically these implementations are
"spillable", in that to prevent using up all memory in the JVM, after some threshold is met or event
is triggered, values are persisted to disk.
The
Map
classes returned must take a
Tuple
as a key, and a
Collection
of Tuples as
a value. Further,
Map.get(Object)
must never return
null
, but on the first call to get() on the map
an empty Collection must be created and stored.
That is,
Map.put(Object, Object)
is never called on the map instance internally,
only
map.get(groupTuple).add(valuesTuple)
.
Using the
TupleCollectionFactory
to create the underlying Tuple Collections would allow that aspect
to be pluggable as well.
If the Map implementation implements the
Spillable
interface, it will receive a
Spillable.SpillListener
instance that calls back to the appropriate logging mechanism for the platform. This instance should be passed
down to any child Spillable types, namely an implementation of
SpillableTupleList
.
The default implementation for the Hadoop platform is the
HadoopTupleMapFactory
which created a
HadoopSpillableTupleMap
instance.
The class
SpillableTupleMap
may be used as a base class.