Sorry Prabhu for hijacking this discussion a bit..  I wonder , what is the best practice to load the data in HDFS in general. Considering the size of the data ( many times its in gbs or TBs generally),   how are storage  and time constraints handled.

 

If anybody  can share your experiences or best practice it would great!

 

-Shailesh.

 

From: Chen He [mailto:airbots@gmail.com]
Sent: Wednesday, September 05, 2012 7:34 PM
To: user@hadoop.apache.org
Subject: Re: One petabyte of data loading into HDFS with in 10 min.

 

If it is not a single file, you can upload them using multiple threads to HDFS.

On Wed, Sep 5, 2012 at 7:21 AM, prabhu K <prabhu.hadoop@gmail.com> wrote:

Hi Users,

 

Please clarify the below questions.

 

1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many slave (Data Nodes) machines required.

 

2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is the configuration setup for cloud computing.

 

Please suggest and help me on this.

 

Thanks&Regards,

Prabhu.