Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E0AF8DF43 for ; Wed, 5 Sep 2012 14:59:45 +0000 (UTC) Received: (qmail 88354 invoked by uid 500); 5 Sep 2012 14:59:40 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 88260 invoked by uid 500); 5 Sep 2012 14:59:40 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 88253 invoked by uid 99); 5 Sep 2012 14:59:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 14:59:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [216.32.180.185] (HELO co1outboundpool.messaging.microsoft.com) (216.32.180.185) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 14:59:31 +0000 Received: from mail202-co1-R.bigfish.com (10.243.78.226) by CO1EHSOBE003.bigfish.com (10.243.66.66) with Microsoft SMTP Server id 14.1.225.23; Wed, 5 Sep 2012 14:59:09 +0000 Received: from mail202-co1 (localhost [127.0.0.1]) by mail202-co1-R.bigfish.com (Postfix) with ESMTP id A3160D801B8 for ; Wed, 5 Sep 2012 14:59:09 +0000 (UTC) X-Forefront-Antispam-Report: CIP:163.181.249.109;KIP:(null);UIP:(null);IPV:NLI;H:ausb3twp02.amd.com;RD:none;EFVD:NLI X-SpamScore: -1 X-BigFish: VPS-1(zzbb2dI98dI9371Ic89bh1432I9a6kzz1202hzz8275bh8275dhz2dh668h839hd25he5bhf0ah107ah1155h) Received: from mail202-co1 (localhost.localdomain [127.0.0.1]) by mail202-co1 (MessageSwitch) id 134685714790201_18866; Wed, 5 Sep 2012 14:59:07 +0000 (UTC) Received: from CO1EHSMHS031.bigfish.com (unknown [10.243.78.237]) by mail202-co1.bigfish.com (Postfix) with ESMTP id 0A7CC40266 for ; Wed, 5 Sep 2012 14:59:07 +0000 (UTC) Received: from ausb3twp02.amd.com (163.181.249.109) by CO1EHSMHS031.bigfish.com (10.243.66.41) with Microsoft SMTP Server id 14.1.225.23; Wed, 5 Sep 2012 14:59:05 +0000 X-WSS-ID: 0M9VTMG-02-7OY-02 X-M-MSG: Received: from sausexedgep02.amd.com (sausexedgep02-ext.amd.com [163.181.249.73]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Axway MailGate 3.8.1) with ESMTP id 2C089C8124 for ; Wed, 5 Sep 2012 09:59:03 -0500 (CDT) Received: from SAUSEXDAG02.amd.com (163.181.55.2) by sausexedgep02.amd.com (163.181.36.59) with Microsoft SMTP Server (TLS) id 8.3.192.1; Wed, 5 Sep 2012 09:59:15 -0500 Received: from [10.236.22.43] (163.181.55.254) by sausexdag02.amd.com (163.181.55.2) with Microsoft SMTP Server (TLS) id 14.1.323.3; Wed, 5 Sep 2012 09:59:02 -0500 Message-ID: <504768B5.4090205@amd.com> Date: Wed, 5 Sep 2012 09:59:01 -0500 From: Nick Jones User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Subject: Re: One petabyte of data loading into HDFS with in 10 min. References: In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed X-Originating-IP: [163.181.55.254] Content-Transfer-Encoding: quoted-printable X-OriginatorOrg: amd.com Since cost wasn't mentioned as a requirement... An army of people mounting physical drives with the original dataset to=20 the cluster of machines and M/R copying from local disk would likely be=20 faster. There are also 40Gbps Infiniband solutions available. Also, the=20 replication could be pushed to a separate network and would eventually=20 achieve consistency (presumably not required in 10mins) thus lowering=20 the primary connection bandwidth requirement to 1PB. On 09/05/2012 09:43 AM, Cosmin Lehene wrote: > Here's an extremely na=EFve ballpark estimation: at theoretical hardwar= e=20 > speed, for 3PB representing 1PB with 3x replication > > Over a single 1Gbps connection (and I'm not sure, you can actually=20 > reach 1Gbps) > (3 petabytes) / (1 Gbps) =3D 291.271111 days > > So you'd need at least 40,000 1Gbps network cards to get that in 10=20 > minutes :) - (3PB/1Gbps)/40000=20 > > > The actual number of nodes would depend a lot on the actual network=20 > architecture, the type of storage you use (SSD, HDD), etc. > Cosmin > From: prabhu K > > Reply-To: "user@hadoop.apache.org "=20 > > > Date: Wednesday, September 5, 2012 3:21 PM > To: "user@hadoop.apache.org "=20 > > > Subject: One petabyte of data loading into HDFS with in 10 min. > > Hi Users, > Please clarify the below questions. > 1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how=20 > many slave (Data Nodes) machines required. > 2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what=20 > is the configuration setup for cloud computing. > Please suggest and help me on this. > Thanks&Regards, > Prabhu.