hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "AmazonEC2" by DougCutting
Date Fri, 27 Oct 2006 20:32:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by DougCutting:
http://wiki.apache.org/lucene-hadoop/AmazonEC2

------------------------------------------------------------------------------
  
   * '''Abstract Machine Image (AMI)''', or ''image''.  A bootable Linux image, with software
pre-installed.
   * '''instance'''.  A host running an AMI.
+ 
+ == Conventions ==
+ 
+ In this document, commands lines that start with '#' are executed on an instance, while
command lines starting with a '%' are executed on your remote workstation.
   
  == Building an Image ==
  
@@ -15, +19 @@

  
  To build an image for Hadoop:
  
-  1. Start an instance of the fedora base image.
+  1. Run an instance of the fedora base image.
  
   1. Login to this instance (using ssh).
  
@@ -59, +63 @@

  # bin/stop-all.sh
  }}}
  
-  1. Save the image, using Amazon's instructions.
+  1. Create a new image, using Amazon's instructions (bundle, upload & register).
  
  == Configuring Hadoop ==
  
@@ -69, +73 @@

  
  
  === hadoop-env.sh ===
+ 
+ Specify the JVM location, the log directory and that rsync should be used to update slaves
from the master.
  
  {{{
  # Set Hadoop-specific environment variables here.
@@ -87, +93 @@

  # otherwise arrive faster than the master can service them.
  export HADOOP_SLAVE_SLEEP=1
  }}}
+ 
+ You must also create the log directory.
  
  {{{
  % mkdir -p /mnt/hadoop/logs
@@ -138, +146 @@

  </configuration>
  }}}
  
- == Running your cluster ==
+ == Security ==
  
+ To access your cluster, you must enable access from at least port 22, for ssh.
+ 
+ {{{
+ % ec2-authorize default -p 22
+ }}}
+ 
+ Hadoop uses ports between 50000 and 50100.  You can permit arbitrary access to your cluster
from remote networks with:
+ 
+ {{{
+ % ec2-authorize default -p 50000-50100
+ }}}
+ 
+ CAUTION: This is very insecure.  With this configuration, all your DFS data is publicly
readable, and anonymous users anywhere on the internet may submit jobs to your cluster, uploading
arbitrary code.  To secure your cluster you should not open these ports to the public, but
rather only to other hosts within the cluster.  Amazon's security group mechanisms make this
possible.  The following is untested.
+ 
+ {{{
+ % ec2-add-group my-group
+ % ec2-authorize my-group -o my-group -u XXXXXXXXXXX
+ }}}
+ 
+ == Launching your cluster ==
+ 
+ Start by allocating instances of your image.  Use '''ec2-describe-images''' to find the
your image id, notated as ami-XXXXXXXX below. 
+ 
+ To run a 20-node cluster:
+ 
+ {{{
+ % ec2-describe-images
+ % ec2-run-instances ami-XXXXXXX -k gsg-keypair -g my-group -n 20
+ }}}
+ 
+ Wait a few minutes for the instances to launch.
+ 
+ {{{
+ % ec2-describe-instances
+ }}}
+ 

Mime
View raw message