Thursday, June 28, 2012

Running R script on AWS EMS


  • Install Amazon EMR Command Line Interface
    1. Install Ruby (1.8 up)
    2. Download and unzip CLI (http://aws.amazon.com/developertools/2264)
    3. Configure credential.json
                     {
                        "access_id": "AWS Access Key ID",
                        "private_key":"AWS Secret Access Key",
                        "keypair": "EC2 keypair name",
                        "key-pair-file":"pem location",
                        "log_uri":"s3n://log-location",
                        "region":"us-east-1"
                      }

  • Job Flow Essentials
    1. Creating a Job Flow  (./elastic-mapreduce --create --alive)
    2. Listing all Job Flow (./elastic-mapreduce  --list
    3. Retrieving information about a specific Job Flow (./elastic-mapreduce --describe --jobflow ID)
    4. Adding a step using default parameter values to a Job Flow (./elastic-mapreduce -j ID --stream)
    5. Terminating a Job Flow (./elastic-mapreduce --terminate ID)
    6. Listing all active Job Flows (./elastic-mapreduce --list --active)
  • Streaming Job Flow
                ./elastic-mapreduce --create --stream \
                                     --mapper s3n://[mapper-location]
                                     --input s3n://[input-location]
                                     --output s3n://[output-location]
                                     --reducer s3n://[reducer-location]


No comments:

Post a Comment