Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B591396D6 for ; Mon, 12 Dec 2011 17:10:58 +0000 (UTC) Received: (qmail 80006 invoked by uid 500); 12 Dec 2011 17:10:54 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 79957 invoked by uid 500); 12 Dec 2011 17:10:54 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 79949 invoked by uid 99); 12 Dec 2011 17:10:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2011 17:10:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prasen.bea@gmail.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-lpp01m010-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2011 17:10:45 +0000 Received: by laam7 with SMTP id m7so1251984laa.35 for ; Mon, 12 Dec 2011 09:10:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=eQEdbf+Tsr3LM8uzBLRZsKCfMTIJXp/GpWdkPESo0HE=; b=G/F7auupXrTIhJU7VttPVofxOdS44sY8qhIPDmj2uM2TOtl0/yiWChwRZPyj71LQ5q sf7Bs1qER4otQScMxUGybL3bNtCNX2VuIDZMNHhfmG2WxlPptBEVdMAECmF2U5FBaQgf SQTCf0tBN/vyoXQrARRDxHI8qhlJaADgSZG8o= MIME-Version: 1.0 Received: by 10.152.134.50 with SMTP id ph18mr12678557lab.1.1323709824976; Mon, 12 Dec 2011 09:10:24 -0800 (PST) Received: by 10.152.43.193 with HTTP; Mon, 12 Dec 2011 09:10:24 -0800 (PST) Date: Mon, 12 Dec 2011 22:40:24 +0530 Message-ID: Subject: Awesome post on Hadoop. Some questions... From: prasenjit mukherjee To: common-user Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Really enthralled to read the post : http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/ Great job done. Some related questions: 1. The article says that hdfs always maintains 2 copies in the same rack and 3rd in a different rack. This only speeds up the hdfs "put" ( fileCreation ) time. But wont it be better be to spread it across 3 racks ? What other advantage will it have for this 2+1 approach. 2. In HDFS the client reads block sequentially. Why the clients cant read the blocks parallel-y ? wont it speed up lookups from client's perspective ? 3. There are some cases in which a Data Node daemon itself will need to read a block of data from HDFS. When would a data node need to read from other data nodes ? Is it when split-size is more than block size ? Even in that case its the tasktracker which should ask for the data and not the data node -Thanks Prasenjit .