Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of Hassen.Riahi@cern.ch
 designates 137.138.144.179 as permitted sender)
Message-ID: <9BA44A9C-57F4-4DC2-897B-D848D5D280EE@cern.ch>
From: Hassen Riahi <hassen.riahi@cern.ch>
To: <hdfs-user@hadoop.apache.org>
In-Reply-To: <BANLkTimFtDJxBq+0BLHL9CybwcJVV_Srqg@mail.gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-1-426369457"
MIME-Version: 1.0 (Apple Message framework v936)
Subject: Re: Read files from hdfs
Date: Wed, 11 May 2011 23:58:03 +0200
References: <E9609F20-0C44-4E78-8533-7CA4A512C416@cern.ch>
 <5A9A3B71E3132548B5A0B11C65C76538010BFDBB36@MX10A.corp.emc.com>
 <BANLkTimFtDJxBq+0BLHL9CybwcJVV_Srqg@mail.gmail.com>
Keywords: CERN SpamKiller Note: -50

--Apple-Mail-1-426369457
Content-Type: text/plain; charset="GB2312"; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable

Thank you Elton and Stanley for your reply.

Given that we are not running map reduce jobs (at least until now) + =20
assuming that the read is sequential + in case where the network is =20
not heavily used, I'll wait to see in general a degradation of =20
performance when reading 1 file from hdfs (hdfs blocks will be read =20
sequentially from different datanodes) compared to reading it from a =20
usual filesystems (which store file without splitting it). is it right?

Thanks,
Hassen


> Hassen,
>
> Read in hdfs is sequential, i.e. read one block after another. Each =20=

> time the client will connect to one data node to read a block. Then =20=

> connect to another (or the same) data node to read next block.
> The reason for this sequential design, I guess, is avoiding n/w =20
> traffic explosion in a heavy map reduce job.
>
> -Elton
>
> 2011/5/8 <stanley.shi@emc.com>
> To my understanding, the reader read file blocks in parallel.
>
> -----Original Message-----
> From: Hassen Riahi [mailto:hassen.riahi@cern.ch]
> Sent: 2011=C4=EA5=D4=C27=C8=D5 23:50
> To: hdfs-user@hadoop.apache.org
> Subject: Read files from hdfs
>
> Hi all,
>
> is the read operation of 1 file stored in hdfs done in parallel?
>
> I mean let's say that I have 1 file split in 2 blocks (hdfs block) and
> each block is stored in 1 rack.
> When reading this file, both blocks are read in parallel? or the first
> block is read and then once done the read of the second block begins?
> If the later is right, the read of files in hdfs is then sequential.
> is it right or am I missing something?
>
> Thanks,
> Hassen
>
>


--Apple-Mail-1-426369457
Content-Type: text/html; charset="GB2312"
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><div><div>Thank you Elton and =
Stanley for your reply.</div><div><br></div><div>Given that we are not =
running map reduce jobs (at least until now) + assuming that the read is =
sequential + in case where the network is not heavily used, I'll wait to =
see in general a degradation of performance when reading 1 file from =
hdfs (hdfs blocks will be read sequentially from different datanodes) =
compared to reading it from a usual filesystems (which store file =
without splitting it). is it =
right?</div><div><br></div><div>Thanks,</div><div>Hassen</div><div><br></d=
iv><div><br></div><blockquote type=3D"cite"><meta =
http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dgb2312">Hassen,<div><br></div><div>Read in hdfs is sequential, =
i.e. read one block after another. Each time the client will connect to =
one data node to read a block. Then connect to another (or the same) =
data node to read next block.&nbsp;</div> <div>The reason for this =
sequential design, I guess, is avoiding n/w traffic explosion in a heavy =
map reduce job.</div><div><br></div><div>-Elton<br><br><div =
class=3D"gmail_quote">2011/5/8  <span dir=3D"ltr">&lt;<a =
href=3D"mailto:stanley.shi@emc.com">stanley.shi@emc.com</a>&gt;</span><br>=
 <blockquote class=3D"gmail_quote" style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; =
border-left-width: 1px; border-left-color: rgb(204, 204, 204); =
border-left-style: solid; padding-left: 1ex; position: static; z-index: =
auto; ">To my understanding, the reader read file blocks in =
parallel.<br> <div><div></div><div class=3D"h5"><br> -----Original =
Message-----<br> From: Hassen Riahi [mailto:<a =
href=3D"mailto:hassen.riahi@cern.ch">hassen.riahi@cern.ch</a>]<br> Sent: =
2011=C4=EA5=D4=C27=C8=D5 23:50<br> To: <a =
href=3D"mailto:hdfs-user@hadoop.apache.org">hdfs-user@hadoop.apache.org</a=
><br> Subject: Read files from hdfs<br> <br> Hi all,<br> <br> is the =
read operation of 1 file stored in hdfs done in parallel?<br> <br> I =
mean let's say that I have 1 file split in 2 blocks (hdfs block) and<br> =
each block is stored in 1 rack.<br> When reading this file, both blocks =
are read in parallel? or the first<br> block is read and then once done =
the read of the second block begins?<br> If the later is right, the read =
of files in hdfs is then sequential.<br> is it right or am I missing =
something?<br> <br> Thanks,<br> Hassen<br> <br> =
</div></div></blockquote></div><br></div></blockquote></div><br></body></h=
tml>=

--Apple-Mail-1-426369457--