From user-return-1131-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Wed Sep 5 14:44:01 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 251C5DD99 for ; Wed, 5 Sep 2012 14:44:01 +0000 (UTC) Received: (qmail 17868 invoked by uid 500); 5 Sep 2012 14:43:55 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 17780 invoked by uid 500); 5 Sep 2012 14:43:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 17773 invoked by uid 99); 5 Sep 2012 14:43:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 14:43:55 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of clehene@adobe.com designates 64.18.1.27 as permitted sender) Received: from [64.18.1.27] (HELO exprod6og111.obsmtp.com) (64.18.1.27) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 14:43:47 +0000 Received: from outbound-smtp-1.corp.adobe.com ([192.150.11.134]) by exprod6ob111.postini.com ([64.18.5.12]) with SMTP ID DSNKUEdlDJz2IHMTQmBa/jvK0utMZtJpQUYH@postini.com; Wed, 05 Sep 2012 07:43:26 PDT Received: from inner-relay-1.corp.adobe.com ([153.32.1.51]) by outbound-smtp-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q85Eepk0009225 for ; Wed, 5 Sep 2012 07:40:51 -0700 (PDT) Received: from nacas02.corp.adobe.com (nacas02.corp.adobe.com [10.8.189.100]) by inner-relay-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q85EhOvm009656 for ; Wed, 5 Sep 2012 07:43:24 -0700 (PDT) Received: from eurhub01.eur.adobe.com (10.128.4.30) by nacas02.corp.adobe.com (10.8.189.100) with Microsoft SMTP Server (TLS) id 8.3.264.0; Wed, 5 Sep 2012 07:43:24 -0700 Received: from eurmbx01.eur.adobe.com ([10.128.4.32]) by eurhub01.eur.adobe.com ([10.128.4.30]) with mapi; Wed, 5 Sep 2012 15:43:23 +0100 From: Cosmin Lehene To: "user@hadoop.apache.org" Date: Wed, 5 Sep 2012 15:43:14 +0100 Subject: Re: One petabyte of data loading into HDFS with in 10 min. Thread-Topic: One petabyte of data loading into HDFS with in 10 min. Thread-Index: Ac2LdMzbTDZPvl8eSOqbASe+O0+MFw== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CC6D39E1CF9Bcleheneadobecom_" MIME-Version: 1.0 --_000_CC6D39E1CF9Bcleheneadobecom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Here's an extremely na=EFve ballpark estimation: at theoretical hardware sp= eed, for 3PB representing 1PB with 3x replication Over a single 1Gbps connection (and I'm not sure, you can actually reach 1G= bps) (3 petabytes) / (1 Gbps) =3D 291.271111 days So you'd need at least 40,000 1Gbps network cards to get that in 10 minutes= :) - (3PB/1Gbps)/40000 The actual number of nodes would depend a lot on the actual network archite= cture, the type of storage you use (SSD, HDD), etc. Cosmin From: prabhu K > Reply-To: "user@hadoop.apache.org" > Date: Wednesday, September 5, 2012 3:21 PM To: "user@hadoop.apache.org" > Subject: One petabyte of data loading into HDFS with in 10 min. Hi Users, Please clarify the below questions. 1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many s= lave (Data Nodes) machines required. 2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is the= configuration setup for cloud computing. Please suggest and help me on this. Thanks&Regards, Prabhu. --_000_CC6D39E1CF9Bcleheneadobecom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Here's an extremely na= =EFve ballpark estimation: at theoretical hardware speed, for 3PB represent= ing 1PB with 3x replication

Over a single 1Gbps co= nnection (and I'm not sure, you can actually reach 1Gbps) 
(= 3 petabytes) / (1 Gbps) =3D 291.271111 days 

= So you'd need at least 40,000 1Gbps network cards to get that in 10 minutes= :) - (3PB/1Gbps)/40000

The actual number of nodes would depend a lot on the actual networ= k architecture, the type of storage you use (SSD,  HDD), etc.
 
Cosmin
From: = prabhu K <prabhu.hado= op@gmail.com>
Reply-To: = "user@hadoop.apache.org" <= user@hadoop.apache.org>Date: Wednesday, September 5, 201= 2 3:21 PM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: One petabyte of data loading into HDFS wi= th in 10 min.

Hi Users,
&nbs= p;
Please clarify the below questions.
 
1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many = slave (Data Nodes) machines required.
 
2. Wi= th in 10 minutes one petabyte of data load into HDFS/HIVE, what is the conf= iguration setup for cloud computing.
 
Please sugg= est and help me on this.
 
Thanks&Regards,
Prabhu.
 
--_000_CC6D39E1CF9Bcleheneadobecom_--