Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 41FAB7CCC for ; Mon, 1 Aug 2011 02:16:37 +0000 (UTC) Received: (qmail 52178 invoked by uid 500); 1 Aug 2011 02:16:34 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 52086 invoked by uid 500); 1 Aug 2011 02:16:33 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 52078 invoked by uid 99); 1 Aug 2011 02:16:33 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 02:16:33 +0000 Received: from localhost (HELO dhcp-02.private.iobm.com) (127.0.0.1) (smtp-auth username aw, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 02:16:32 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: Hadoop cluster network requirement From: Allen Wittenauer In-Reply-To: Date: Sun, 31 Jul 2011 19:16:29 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: X-Mailer: Apple Mail (2.1082) On Jul 31, 2011, at 12:08 PM, wrote: > I was asked by our IT folks if we can put hadoop name nodes storage = using a shared disk storage unit. =20 What do you mean by "shared disk storage unit"? There are lots = of products out there that would claim this, so actual deployment = semantics are important. > Does anyone have experience of how much IO throughput is required on = the name nodes? IO throughput is completely dependent dependent upon how many = changes are being applied to the file system and frequency of edits log = merging. In the majority of cases it is "not much". What tends to = happen where the storage is shared (such as a NAS) is that the *other* = traffic blocks the writes for too long because it is overloaded and the = NN declares it dead. > What are the latency/data throughput requirements between the master = and data nodes - can this tolerate network routing? If you mean "different data centers", then no. If you mean = "same data center, but with routers in between", then probably yes, but = you add several more failure points, so this isn't recommended.=09 > Did anyone published any throughput requirement for the best network = setup recommendation? =09 Not that I know of. It is very much dependent upon the actual = workload being performed. But I wouldn't deploy anything slower than a = 1:4 overcommit (uplink-to-host) on the DN side for anything = real/significant. > This message is for the designated recipient only and may contain = privileged, proprietary, or otherwise private information. If you have = received it in error, please notify the sender immediately and delete = the original. Any other use of the email by you is prohibited. Lawyers are funny people. I wonder how much they got paid for = this one.=