public class HadoopFlow extends BaseFlow<org.apache.hadoop.mapred.JobConf>
Flow
.
HadoopFlow must be created through a HadoopFlowConnector
instance.
If classpath paths are provided on the FlowDef
, the Hadoop distributed cache mechanism will be used
to augment the remote classpath.
Any path elements that are relative will be uploaded to HDFS, and the HDFS URI will be used on the JobConf. Note
all paths are added as "files" to the JobConf, not archives, so they aren't needlessly uncompressed cluster side.HadoopFlowConnector
BaseFlow.FlowHolder
flowStats, sinks, sources, stop, stopJobsOnExit, thread
CASCADING_FLOW_ID
Modifier | Constructor and Description |
---|---|
protected |
HadoopFlow() |
|
HadoopFlow(PlatformInfo platformInfo,
java.util.Map<java.lang.Object,java.lang.Object> properties,
org.apache.hadoop.mapred.JobConf jobConf,
FlowDef flowDef) |
protected |
HadoopFlow(PlatformInfo platformInfo,
java.util.Map<java.lang.Object,java.lang.Object> properties,
org.apache.hadoop.mapred.JobConf jobConf,
java.lang.String name) |
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.mapred.JobConf |
getConfig()
Method getConfig returns the internal configuration object.
|
java.util.Map<java.lang.Object,java.lang.Object> |
getConfigAsProperties()
Method getConfiAsProperties converts the internal configuration object into a
Map of
key value pairs. |
org.apache.hadoop.mapred.JobConf |
getConfigCopy()
Method getConfigCopy returns a copy of the internal configuration object.
|
FlowProcess<org.apache.hadoop.mapred.JobConf> |
getFlowProcess() |
protected int |
getMaxNumParallelSteps() |
java.lang.String |
getProperty(java.lang.String key)
Method getProperty returns the value associated with the given key from the underlying properties system.
|
protected void |
initConfig(java.util.Map<java.lang.Object,java.lang.Object> properties,
org.apache.hadoop.mapred.JobConf parentConfig)
This method creates a new internal Config with the parentConfig as defaults using the properties to override
the defaults.
|
protected void |
initFromProperties(java.util.Map<java.lang.Object,java.lang.Object> properties) |
protected void |
internalClean(boolean stop) |
protected void |
internalShutdown() |
protected void |
internalStart() |
boolean |
isPreserveTemporaryFiles()
Method isPreserveTemporaryFiles returns false if temporary files will be cleaned when this Flow completes.
|
protected org.apache.hadoop.mapred.JobConf |
newConfig(org.apache.hadoop.mapred.JobConf defaultConfig) |
protected void |
setConfigProperty(org.apache.hadoop.mapred.JobConf config,
java.lang.Object key,
java.lang.Object value) |
boolean |
stepsAreLocal()
Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.
|
addListener, addStepListener, areSinksStale, areSourcesNewer, cleanup, complete, createConfig, createFlowThread, deleteCheckpointsIfNotUpdate, deleteCheckpointsIfReplace, deleteSinks, deleteSinksIfNotUpdate, deleteSinksIfReplace, deleteTrapsIfNotUpdate, deleteTrapsIfReplace, fireOnCompleted, fireOnStarting, fireOnStopping, fireOnThrowable, getCascadeID, getCascadingServices, getCheckpointNames, getCheckpoints, getCheckpointsCollection, getClassPath, getFieldsFor, getFlowSession, getFlowSkipStrategy, getFlowStats, getFlowSteps, getFlowStepStrategy, getHolder, getID, getName, getPlatformInfo, getRunID, getSink, getSink, getSinkModified, getSinkNames, getSinks, getSinksCollection, getSource, getSourceNames, getSources, getSourcesCollection, getSpawnStrategy, getStats, getSubmitPriority, getTags, getTrapNames, getTraps, getTrapsCollection, handleExecutorShutdown, hasListeners, hasStepListeners, initialize, initializeNewJobsMap, initSteps, internalStopAllJobs, isSkipFlow, isStopJobsOnExit, logInfo, openSink, openSink, openSource, openSource, openTapForRead, openTapForWrite, openTrap, openTrap, prepare, presentSinkFields, presentSourceFields, registerShutdownHook, removeListener, removeStepListener, resourceExists, retrieveSinkFields, retrieveSourceFields, setCascade, setCheckpoints, setFlowSkipStrategy, setFlowStepGraph, setFlowStepStrategy, setName, setSinks, setSources, setSpawnStrategy, setSubmitPriority, setTraps, start, stop, toString, updateSchemes, writeDOT, writeStepsDOT
protected HadoopFlow()
protected HadoopFlow(PlatformInfo platformInfo, java.util.Map<java.lang.Object,java.lang.Object> properties, org.apache.hadoop.mapred.JobConf jobConf, java.lang.String name)
public HadoopFlow(PlatformInfo platformInfo, java.util.Map<java.lang.Object,java.lang.Object> properties, org.apache.hadoop.mapred.JobConf jobConf, FlowDef flowDef)
protected void initFromProperties(java.util.Map<java.lang.Object,java.lang.Object> properties)
initFromProperties
in class BaseFlow<org.apache.hadoop.mapred.JobConf>
protected void initConfig(java.util.Map<java.lang.Object,java.lang.Object> properties, org.apache.hadoop.mapred.JobConf parentConfig)
BaseFlow
initConfig
in class BaseFlow<org.apache.hadoop.mapred.JobConf>
properties
- of type MapparentConfig
- of type Configprotected void setConfigProperty(org.apache.hadoop.mapred.JobConf config, java.lang.Object key, java.lang.Object value)
setConfigProperty
in class BaseFlow<org.apache.hadoop.mapred.JobConf>
protected org.apache.hadoop.mapred.JobConf newConfig(org.apache.hadoop.mapred.JobConf defaultConfig)
public org.apache.hadoop.mapred.JobConf getConfig()
Flow
FlowConnector
for setting
default properties visible to children. Or see FlowStepStrategy
for setting properties on
individual steps before they are executed.public org.apache.hadoop.mapred.JobConf getConfigCopy()
Flow
public java.util.Map<java.lang.Object,java.lang.Object> getConfigAsProperties()
Flow
Map
of
key value pairs.public java.lang.String getProperty(java.lang.String key)
key
- of type Stringpublic FlowProcess<org.apache.hadoop.mapred.JobConf> getFlowProcess()
public boolean isPreserveTemporaryFiles()
protected void internalStart()
internalStart
in class BaseFlow<org.apache.hadoop.mapred.JobConf>
public boolean stepsAreLocal()
Flow
protected void internalClean(boolean stop)
internalClean
in class BaseFlow<org.apache.hadoop.mapred.JobConf>
protected void internalShutdown()
internalShutdown
in class BaseFlow<org.apache.hadoop.mapred.JobConf>
protected int getMaxNumParallelSteps()
getMaxNumParallelSteps
in class BaseFlow<org.apache.hadoop.mapred.JobConf>