Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA7A94715 for ; Wed, 11 May 2011 21:58:47 +0000 (UTC) Received: (qmail 99211 invoked by uid 500); 11 May 2011 21:58:47 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 99166 invoked by uid 500); 11 May 2011 21:58:47 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 99158 invoked by uid 99); 11 May 2011 21:58:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 21:58:47 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Hassen.Riahi@cern.ch designates 137.138.144.179 as permitted sender) Received: from [137.138.144.179] (HELO CERNMX31.cern.ch) (137.138.144.179) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 21:58:40 +0000 Received: from CERNFE20.cern.ch (137.138.144.155) by cernmxgwlb2.cern.ch (137.138.144.179) with Microsoft SMTP Server (TLS) id 14.1.270.1; Wed, 11 May 2011 23:58:17 +0200 Received: from [192.168.1.97] (217.133.6.172) by smtp.cern.ch (137.138.144.172) with Microsoft SMTP Server (TLS) id 14.1.270.2; Wed, 11 May 2011 23:58:16 +0200 Message-ID: <9BA44A9C-57F4-4DC2-897B-D848D5D280EE@cern.ch> From: Hassen Riahi To: In-Reply-To: Content-Type: multipart/alternative; boundary="Apple-Mail-1-426369457" MIME-Version: 1.0 (Apple Message framework v936) Subject: Re: Read files from hdfs Date: Wed, 11 May 2011 23:58:03 +0200 References: <5A9A3B71E3132548B5A0B11C65C76538010BFDBB36@MX10A.corp.emc.com> X-Mailer: Apple Mail (2.936) X-Originating-IP: [217.133.6.172] Keywords: CERN SpamKiller Note: -50 --Apple-Mail-1-426369457 Content-Type: text/plain; charset="GB2312"; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Thank you Elton and Stanley for your reply. Given that we are not running map reduce jobs (at least until now) + =20 assuming that the read is sequential + in case where the network is =20 not heavily used, I'll wait to see in general a degradation of =20 performance when reading 1 file from hdfs (hdfs blocks will be read =20 sequentially from different datanodes) compared to reading it from a =20 usual filesystems (which store file without splitting it). is it right? Thanks, Hassen > Hassen, > > Read in hdfs is sequential, i.e. read one block after another. Each =20= > time the client will connect to one data node to read a block. Then =20= > connect to another (or the same) data node to read next block. > The reason for this sequential design, I guess, is avoiding n/w =20 > traffic explosion in a heavy map reduce job. > > -Elton > > 2011/5/8 > To my understanding, the reader read file blocks in parallel. > > -----Original Message----- > From: Hassen Riahi [mailto:hassen.riahi@cern.ch] > Sent: 2011=C4=EA5=D4=C27=C8=D5 23:50 > To: hdfs-user@hadoop.apache.org > Subject: Read files from hdfs > > Hi all, > > is the read operation of 1 file stored in hdfs done in parallel? > > I mean let's say that I have 1 file split in 2 blocks (hdfs block) and > each block is stored in 1 rack. > When reading this file, both blocks are read in parallel? or the first > block is read and then once done the read of the second block begins? > If the later is right, the read of files in hdfs is then sequential. > is it right or am I missing something? > > Thanks, > Hassen > > --Apple-Mail-1-426369457 Content-Type: text/html; charset="GB2312" Content-Transfer-Encoding: quoted-printable
Thank you Elton and = Stanley for your reply.

Given that we are not = running map reduce jobs (at least until now) + assuming that the read is = sequential + in case where the network is not heavily used, I'll wait to = see in general a degradation of performance when reading 1 file from = hdfs (hdfs blocks will be read sequentially from different datanodes) = compared to reading it from a usual filesystems (which store file = without splitting it). is it = right?

Thanks,
Hassen


Hassen,

Read in hdfs is sequential, = i.e. read one block after another. Each time the client will connect to = one data node to read a block. Then connect to another (or the same) = data node to read next block. 
The reason for this = sequential design, I guess, is avoiding n/w traffic explosion in a heavy = map reduce job.


= --Apple-Mail-1-426369457--