hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: One petabyte of data loading into HDFS with in 10 min.
Date Wed, 05 Sep 2012 14:22:36 GMT
Hello Shailesh,

      Give distcp a shot. It runs a MR for copying data from source to
destination, so the data can be copied parallely.

Regards,
    Mohammad Tariq



On Wed, Sep 5, 2012 at 7:44 PM, Shailesh Dargude <
Shailesh_Dargude@symantec.com> wrote:

> Sorry Prabhu for hijacking this discussion a bit..  I wonder , what is the
> best practice to load the data in HDFS in general. Considering the size of
> the data ( many times its in gbs or TBs generally),   how are storage  and
> time constraints handled.****
>
> ** **
>
> If anybody  can share your experiences or best practice it would great!***
> *
>
> ** **
>
> -Shailesh.****
>
> ** **
>
> *From:* Chen He [mailto:airbots@gmail.com]
> *Sent:* Wednesday, September 05, 2012 7:34 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: One petabyte of data loading into HDFS with in 10 min.****
>
> ** **
>
> If it is not a single file, you can upload them using multiple threads to
> HDFS.****
>
> On Wed, Sep 5, 2012 at 7:21 AM, prabhu K <prabhu.hadoop@gmail.com> wrote:*
> ***
>
> Hi Users,****
>
>  ****
>
> Please clarify the below questions.****
>
>  ****
>
> 1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many
> slave (Data Nodes) machines required.****
>
>  ****
>
> 2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is
> the configuration setup for cloud computing.****
>
>  ****
>
> Please suggest and help me on this.****
>
>  ****
>
> Thanks&Regards,****
>
> Prabhu.****
>
>  ****
>
> ** **
>

Mime
View raw message