hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitchcock, Andrew" <a...@amazon.com>
Subject Re: Hadoop Compatibility and EMR
Date Tue, 23 Mar 2010 18:12:54 GMT
We recommend that people use Amazon S3 as the durable store when using Elastic MapReduce. We
consider the HDFS on Elastic MapReduce clusters to be transient.

With that said, you need some way to get your data into S3 from HDFS. We recommend storing
the files directly in S3 (with S3N) and not using the S3 block file system. That presents
two challenges:

1. Making sure all files on your cluster are less than 5 GB.
2. Uploading your files without the use of S3N (which wasn't introduced until 0.18).

You'll probably want to write a DistCp-like job which reads the files from HDFS and uploads
them to S3. If necessary, it should also detect files that are larger than 5 GB and split
them into multiple pieces.

Andrew




On 3/22/10 9:23 PM, "ilayaraja" <ilayaraja@rediff.co.in> wrote:

Hi Andrew,

Yes. The data is on EC2 cluster only.

Regards,
Ilay

----- Original Message -----

From:  Hitchcock, Andrew <mailto:anhi@amazon.com>

To: common-dev@hadoop.apache.org ;  ilayaraja@rediff.co.in

Sent: Tuesday, March 23, 2010 1:57  AM

Subject: Re: Hadoop Compatibility and  EMR


Hi,

At this time Elastic MapReduce only  supports Hadoop 0.18.3.

The cluster that stores the 10 TB of data, is  that currently running on Amazon EC2?

Regards,
Andrew

On Mar  21, 2010, at 12:23 AM, "ilayaraja" <ilayaraja@rediff.co.in>  wrote:
> Hi,
>
> We 've been using hadoop 15.5 in our  production environment where we have about 10 TB
of data stored on the  dfs.
> The files were generated as mapreduce output. We want to move our  env. to Amazon Elastic
Map Reduce (EMR) which throws the following questions  to > us:
>
> 1. EMR supports only hadoop 19.0 and above. Is it  possible to use the current data that
were generated with hadoop 15.5 from  hadoop 19.0?
>
> 2. Or how can we make it possible to use or  update to hadoop 19.0 from hadoop 15.5?
What are the issues expected while  doing so?
>
>
> Regards,
> Ilayaraja


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message