hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "AmazonEC2" by JoydeepSensarma
Date Fri, 15 May 2009 09:33:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by JoydeepSensarma:

The comment on the change is:
add note about submitting jobs from machines outside EC2

   * Keep in mind that the master node is started first and configured, then all slaves nodes
are booted simultaneously with boot parameters pointing to the master node. Even though the
`lauch-cluster` command has returned, the whole cluster may not have yet 'booted'. You should
monitor the cluster via port 50030 to make sure all nodes are up. 
+ === Running a job on a cluster from a remote machine (0.17+) ===
+ In some cases it's desirable to be able to submit a job to a hadoop cluster running in EC2
from a machine that's outside EC2 (for example a personal workstation). Similarly - it's convenient
to be able to browse/cat files in HDFS from a remote machine. One of the advantages of this
is technique is that it obviates the need to create custom AMIs that bundle stock Hadoop AMIs
and user libraries/code. All the non-Hadoop code can be kept on the remote machine and can
be made available to Hadoop during job submission time (in the form of jar files and other
files that are copied into Hadoop's distributed cache). The only downside being the [http://aws.amazon.com/ec2/#pricing
cost of copying these data sets] into EC2 and the latency involved in doing so.
+ The recipe for doing this is well described in [http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
this Cloudera blog post] and involves configuring hadoop to use a ssh tunnel through the master
hadoop node. In addition - this recipe only works when using EC2 scripts from versions of
Hadoop that have the fix for [https://issues.apache.org/jira/browse/HADOOP-5839 HADOOP-5839]
incorporated. (Alternatively, users can apply patches from this JIRA to older versions of
Hadoop that do not have this fix).
  == Troubleshooting (Pre 0.17) ==
  Running Hadoop on EC2 involves a high level of configuration, so it can take a few goes
to get the system working for your particular set up.

View raw message