classBatchAccumuloInputFormat extends InputFormatBase[Key, Value] with LazyLogging
This input format will use Accumulo [TabletLocator] to create InputSplits for each tablet that contains
records from specified ranges. This is unlike AccumuloInputFormat which creates a single split per Range.
The MultiRangeInputSplits are intended to be read using a BatchScanner. This drastically reduces the number of
splits and consequnetly spark tasks that are produced by this InputFormat. Locality is preserved because tablets
may only be hosted by a single tablet server at a given time.
Because RecordReader uses BatchScanner a number of modes are not supported: Offline, Isolated and Local Iterators.
These modes are backed by specalized scanners that only support scanning through a single range.
We borrow some Accumulo machinery to set and read configurations so classOf AccumuloInputFormat should be used
for modifying Configuration, as if AccumuloInputFormat will be used.
This class uses the internal Accumulo API and will likely not work across versions.
WARNING: The locality of the splits rely on reverse resolution of tserver IPs matching those of spark workers.
Linear Supertypes
LazyLogging, Logging, InputFormatBase[Key, Value], InputFormat[Key, Value], AnyRef, Any
This input format will use Accumulo [TabletLocator] to create InputSplits for each tablet that contains records from specified ranges. This is unlike AccumuloInputFormat which creates a single split per Range. The MultiRangeInputSplits are intended to be read using a BatchScanner. This drastically reduces the number of splits and consequnetly spark tasks that are produced by this InputFormat. Locality is preserved because tablets may only be hosted by a single tablet server at a given time.
Because RecordReader uses BatchScanner a number of modes are not supported: Offline, Isolated and Local Iterators. These modes are backed by specalized scanners that only support scanning through a single range.
We borrow some Accumulo machinery to set and read configurations so classOf AccumuloInputFormat should be used for modifying Configuration, as if AccumuloInputFormat will be used.
This class uses the internal Accumulo API and will likely not work across versions.
WARNING: The locality of the splits rely on reverse resolution of tserver IPs matching those of spark workers.