hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "AmazonEC2" by DougCutting
Date Sat, 28 Oct 2006 03:35:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by DougCutting:
http://wiki.apache.org/lucene-hadoop/AmazonEC2

------------------------------------------------------------------------------
  = Running Hadoop on Amazon EC2 =
  
- [http://www.amazon.com/gp/browse.html?node=201590011 Amazon EC2] is a computing service.
 One allocates a set of hosts, and runs ones's application on them, then, when done, de-allocates
the hosts.  Billing is hourly per host.  Thus EC2 permits one to deploy Hadoop on a cluster
without having to own and operate that cluster, but rather renting it on an hourly basis.
+ [http://www.amazon.com/gp/browse.html?node=201590011 Amazon EC2] (Elastic Compute Cloud)
is a computing service.  One allocates a set of hosts, and runs ones's application on them,
then, when done, de-allocates the hosts.  Billing is hourly per host.  Thus EC2 permits one
to deploy Hadoop on a cluster without having to own and operate that cluster, but rather renting
it on an hourly basis.
  
  This document assumes that you have already followed the steps in [http://docs.amazonwebservices.com/AmazonEC2/gsg/2006-06-26/
Amazon's Getting Started Guide].
  
@@ -212, +212 @@

  % ec2-terminate-instances `ec2-describe-instances | grep INSTANCE | cut -f 2`
  }}}
  
+ 
+ == Future Work ==
+ 
+ Ideally Hadoop could directly access job data from [http://www.amazon.com/gp/browse.html?node=16427261
Amazon S3] (Simple Storage Service).  Initial input could be read from S3 when a cluster is
launched, and the final output could be written back to S3 before the cluster is decomissioned.
 Intermediate, temporary data, only needed between MapReduce passes, would be more efficiently
stored in Hadoop's DFS.  This would require an implementation of a Hadoop [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/fs/FileSystem.html
FileSystem] for S3.  There are two issues in Hadoop's bug database related to this:
+ 
+  * [http://issues.apache.org/jira/browse/HADOOP-574 HADOOP-574]
+  * [http://issues.apache.org/jira/browse/HADOOP-571 HADOOP-571]
+ 
+ Please vote for these issues in Jira if you feel this would help your project.  (Anyone
can create themselves a Jira account in order to vote on issues, etc.)
+ 

Mime
View raw message