hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Jones <nick.jo...@amd.com>
Subject Re: One petabyte of data loading into HDFS with in 10 min.
Date Wed, 05 Sep 2012 14:59:01 GMT
Since cost wasn't mentioned as a requirement...

An army of people mounting physical drives with the original dataset to 
the cluster of machines and M/R copying from local disk would likely be 
faster.

There are also 40Gbps Infiniband solutions available.  Also, the 
replication could be pushed to a separate network and would eventually 
achieve consistency (presumably not required in 10mins) thus lowering 
the primary connection bandwidth requirement to 1PB.

On 09/05/2012 09:43 AM, Cosmin Lehene wrote:
> Here's an extremely naïve ballpark estimation: at theoretical hardware 
> speed, for 3PB representing 1PB with 3x replication
>
> Over a single 1Gbps connection (and I'm not sure, you can actually 
> reach 1Gbps)
> (3 petabytes) / (1 Gbps) = 291.271111 days
>
> So you'd need at least 40,000 1Gbps network cards to get that in 10 
> minutes :) - (3PB/1Gbps)/40000 
> <http://www.google.ro/search?client=safari&rls=en&q=%283PB/1Gbps%29/40000&ie=UTF-8&oe=UTF-8&redir_esc=&ei=2WRHUNWtGIWo0QW52oDYDw>
>
> The actual number of nodes would depend a lot on the actual network 
> architecture, the type of storage you use (SSD,  HDD), etc.
> Cosmin
> From: prabhu K <prabhu.hadoop@gmail.com <mailto:prabhu.hadoop@gmail.com>>
> Reply-To: "user@hadoop.apache.org <mailto:user@hadoop.apache.org>" 
> <user@hadoop.apache.org <mailto:user@hadoop.apache.org>>
> Date: Wednesday, September 5, 2012 3:21 PM
> To: "user@hadoop.apache.org <mailto:user@hadoop.apache.org>" 
> <user@hadoop.apache.org <mailto:user@hadoop.apache.org>>
> Subject: One petabyte of data loading into HDFS with in 10 min.
>
> Hi Users,
> Please clarify the below questions.
> 1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how 
> many slave (Data Nodes) machines required.
> 2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what 
> is the configuration setup for cloud computing.
> Please suggest and help me on this.
> Thanks&Regards,
> Prabhu.



Mime
View raw message