This documentation provides an overview of possible settings for Stratosphere.
All configuration is done in conf/stratosphere-conf.yaml
, which is expected to be a flat collection of YAML key value pairs with format key: value
.
The system and run scripts parse the config at startup and override the respective default values with the given values for every that has been set. This page contains a reference for all configuration keys used in the system.
Key | Description | Default Value |
---|---|---|
env.java.home | The path to the Java installation to use. | The system's default Java installation |
jobmanager.rpc.address | The IP address of the JobManager. | localhost |
jobmanager.rpc.port | The port number of the JobManager. | 6123 |
jobmanager.profiling.enable | The key to check if the job manager's profiling component should be enabled (required for load charts in NepheleGUI). | false |
jobmanager.web.port | Port of the JobManager's web interface. | 8081 |
jobmanager.heap.mb | JVM heap size in megabytes of the JobManager. | 256 |
taskmanager.heap.mb | JVM heap size in megabytes of the TaskManager. | 512 |
taskmanager.tmp.dirs | The directory for temporary files, or a list of directories separated by the systems directory delimiter (for example ':' (colon) on Linux/Unix). If multiple directories are specified then the temporary files will be distributed across the directories in a round robin fashion. The I/O manager component will spawn one reading and one writing thread per directory. A directory may be listed multiple times to have the I/O manager use multiple threads for it (for example if it is physically stored on a very fast disc or RAID). | The system's tmp dir |
taskmanager.memory.size | The amount of memory available for the task manager's memory manager, in megabytes. The memory manager distributes memory among the different tasks, which need it for sorting, hash tables, or result caching. If unspecified (-1), the memory manager will take a fixed ratio of the heap memory available to the JVM after all Nephele services have started (0.8). | -1 |
channel.network.numberOfBuffers | The default number of buffers available to the system. This number determines how many channels a TaskManager can have at the same time and how well buffered the channels are. If a job is rejected or you get a warning that the system has too little buffers available, increase this value. | 256 |
pact.parallelization.degree | The default degree of parallelism to use for pact programs that have no degree of parallelism specified. A value of -1 indicates no limit, in which the degree of parallelism is set to the number of available instances (at the time of compilation) times the intra-node parallelism of each instance. | -1 |
pact.parallelization.max-intra-node-degree | The maximal number of parallel instances of the user function that are assigned to a single computing instance. A value of -1 indicates no limit. If the desired degree of parallelism is not achievable with the given number of instances and the given upper limit per instance, then the degree of parallelism may be reduced by the compiler. | -1 |
The following parameters configure Nephele's JobManager, TaskManager, and runtime channel management.
Key | Description | Default Value |
---|---|---|
job.execution.retries | The default number of times of vertex shall be reexecuted before its execution is considered as failed. | 2 |
jobclient.polling.interval | The recommended client polling interval (seconds). | 5 |
jobmanager.rpc.address | The IP address of the JobManager. | localhost |
jobmanager.rpc.port | The port number of the JobManager. | 6123 |
jobmanager.rpc.numhandler | The number of RPC threads for the JobManager. | 3 |
jobmanager.profiling.enable | Decide whether the Profiler is used or not. | false |
jobmanager.web.port | Port of the JobManager's web interface. | 8081 |
jobmanager.heap.mb | JVM heap size in megabytes of the JobManager. | 256 |
taskmanager.heap.mb | JVM heap size in megabytes of the TaskManager. | 512 |
taskmanager.rpc.port | The task manager's IPC port. | 6122 |
taskmanager.data.port | The task manager's data port used for NETWORK channels. | 6121 |
taskmanager.setup.periodictaskinterval | The interval of periodic tasks (Heartbeat, check Task Execution) by the TaskManager. The interval is measured in milliseconds. | 1000 |
taskmanager.memory.size | The amount of memory available for the task manager's memory manager, in megabytes. The memory manager distributes memory among the different tasks, which need it for sorting, hash tables, or result caching. If unspecified (-1), the memory manager will take a fixed ratio of the heap memory available to the JVM after all Nephele services have started (0.8). | -1 |
taskmanager.tmp.dirs | The directory for temporary files, or a list of directories separated by the systems directory delimiter (for example ':' (colon) on Linux/Unix). If multiple directories are specified then the temporary files will be distributed across the directories in a round robin fashion. The I/O manager component will spawn one reading and one writing thread per directory. A directory may be listed multiple times to have the I/O manager use multiple threads for it (for example if it is physically stored on a very fast disc or RAID). | The system's tmp dir |
channel.network.numberOfBuffers | The default number of buffers available to the system. This number determines how many channels a TaskManager can have at the same time and how well buffered the channels are. If a job is rejected or you get a warning that the system has too little buffers available, increase this value. | 256 |
channel.network.bufferSizeInBytes | The default number of read buffers. | 64 * 1024 (64 k) |
channel.network.numberOfOutgoingConnectionThreads | The default number of outgoing connection threads. | 1 |
channel.network.numberOfConnectionRetries | The default number of connection retries. | 10 |
channel.network.allowSenderSideSpilling | !Enables/Disables spilling of network channels. Spilling happens if a task does not consume data as fast as it arrives on the network. Spilling can prevent deadlocks but can cause bad performance. |
false |
channel.network.mergeSpilledBuffers | Enables/disables merging of spilled buffers | true |
channel.inMemory.numberOfConnectionRetries | The number of connection retries. | 30 |
The following parameters configure the PACT compiler and have therefore an impact on the scheduling and execution of PACT programs on Nephele.
Key | Description | Default Value |
---|---|---|
pact.parallelization.degree | The default degree of parallelism to use for pact programs that have no degree of parallelism specified. A value of -1 indicates no limit, in which the degree of parallelism is set to the number of available instances at the time of compilation. | -1 |
pact.parallelization.max-intra-node-degree | The maximal number of parallel instances of the user function that are assigned to a single computing instance. A value of -1 indicates no limit. If the desired degree of parallelism is not achievable with the given number of instances and the given upper limit per instance, then the degree of parallelism may be reduced by the compiler. | 1 |
pact.parallelization.maxmachines | An optional hard limit in the number of machines (Nephele Instances) to use. A program will never use more than the here specified number of machines. If set to '-1', the limit is set by the maximal number of instances available in the cluster. If this value is set, the actual number of machines used for certain tasks may be even lower than this value, due to scheduling constraints. | -1 |
These parameters configure the PACT web interface. For information on how to start, use, and configure the PACT web interface, refer to here.
pact.web.port | The port of the frontend web server | 8080 |
pact.web.rootpath | The path to the root directory containing the web documents | ./resources/web-docs/ |
pact.web.temp | The temp directory for the web server. Used for example for caching file fragments during file-uploads. | /tmp |
pact.web.uploaddir | The directory into which the web server will store uploaded pact programs. | /tmp/pact-jobs/ |
pact.web.plandump | The directory into which the web server will dump temporary JSON files describing pact plans. | /tmp/pact-plans/ |
These parameters configure the default HDFS used by Stratosphere. If you don't specify a HDFS configuration, you will have to specify the full path to your HDFS files like hdfs://address:port/path/to/files
.
Key | Description | Default Value |
---|---|---|
fs.hdfs.hdfsdefault | The absolute path of Hadoop's own configuration file “hdfs-default.xml”. | null |
fs.hdfs.hdfssite | The absolute path of Hadoop's own configuration file “hdfs-site.xml”. | null |
These parameters configure Nephele's profiling features.
Key | Description | Default Value |
---|---|---|
jobmanager.profiling.enable | Enables/disables the job manager's profiling component. | false |
jobmanager.profiling.classname | The class name of the the job manager's profiling component to load if profiling is enabled. | null |
taskmanager.profiling.classname | The class name of the task manager's profiling component to load if profiling is enabled. | null |
taskmanager.profiling.rpc.numhandler | Number of threads for the profiler. | 3 |
taskmanager.profiling.reportinterval | The interval in which a task manager is supposed to send profiling data to the job manager (s). | 2 |
jobmanager.profiling.rpc.port | The job manager's profiling RPC port. | 6124 |
These parameters configure Nephele's visualization client.
Key | Description | Default Value |
---|---|---|
visualization.bottleneckDetection.enable | Boolean: enable Bottleneck Detection | false |