hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shailesh Dargude <Shailesh_Darg...@symantec.com>
Subject RE: One petabyte of data loading into HDFS with in 10 min.
Date Wed, 05 Sep 2012 14:14:59 GMT
Sorry Prabhu for hijacking this discussion a bit..  I wonder , what is the best practice to
load the data in HDFS in general. Considering the size of the data ( many times its in gbs
or TBs generally),   how are storage  and time constraints handled.

If anybody  can share your experiences or best practice it would great!

-Shailesh.

From: Chen He [mailto:airbots@gmail.com]
Sent: Wednesday, September 05, 2012 7:34 PM
To: user@hadoop.apache.org
Subject: Re: One petabyte of data loading into HDFS with in 10 min.

If it is not a single file, you can upload them using multiple threads to HDFS.
On Wed, Sep 5, 2012 at 7:21 AM, prabhu K <prabhu.hadoop@gmail.com<mailto:prabhu.hadoop@gmail.com>>
wrote:
Hi Users,

Please clarify the below questions.

1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many slave (Data Nodes)
machines required.

2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is the configuration
setup for cloud computing.

Please suggest and help me on this.

Thanks&Regards,
Prabhu.



Mime
View raw message