Quickstart: Setup

Get Stratosphere up and running in a few simple steps.

Stratosphere runs on all UNIX-like environments: Linux, Mac OS X, Cygwin. The only requirement is to have a working Java 6.x (or higher) installation.

Download the ready to run binary package. Choose the Stratosphere distribution that matches your Hadoop version. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2.

You are almost done.

  1. Go to the download directory,
  2. Unpack the downloaded archive, and
  3. Start Stratosphere.
$ cd ~/Downloads              # Go to download directory
$ tar xzf stratosphere-*.tgz  # Unpack the downloaded archive
$ cd stratosphere
$ bin/start-local.sh          # Start Stratosphere

Check the JobManager's web frontend at http://localhost:8081 and make sure everything is up and running.

Run the Word Count example to see Stratosphere at work.

  1. Download test data:
    $ wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt
    
    You now have a text file called hamlet.txt in your working directory.
  2. Start the example program:
    $ bin/pact-client.sh run \
        --jarfile ./examples/pact/pact-examples-0.4-SNAPSHOT-WordCount.jar \
        --arguments 1 file://`pwd`/hamlet.txt file://`pwd`/wordcount-result.txt
    
    You will find a file called wordcount-result.txt in your current directory.

Running Stratosphere on a cluster is as easy as running it locally. Having passwordless SSH and a the same directory structure on all your cluster nodes lets you use our scripts to control everything.

  1. Copy Stratosphere to the same file system path on each node of your setup,
  2. Choose a master node (JobManager) and set the jobmanager.rpc.address key in conf/stratosphere-conf.yaml to its IP or hostname. Make sure that all nodes in your cluster have the same jobmanager.rpc.address configured.
  3. Add the IPs or hostnames (one per line) of all worker nodes (TaskManager) to the slaves files in conf/slaves.

You can now start the cluster at your master node with bin/start-cluster.sh.

If you have a more individual setup, you can also start the JobManager on the master node and the TaskManagers on the nodes on your own.


For more detailed instructions, check out the Documentation.