hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Read files from hdfs
Date Thu, 12 May 2011 05:35:55 GMT
Yes it could get slower cause the operation would now involve a disk
read AND a network transfer (with other little overheads it carries

2011/5/12 Hassen Riahi <hassen.riahi@cern.ch>:
> Thank you Elton and Stanley for your reply.
> Given that we are not running map reduce jobs (at least until now) +
> assuming that the read is sequential + in case where the network is not
> heavily used, I'll wait to see in general a degradation of performance when
> reading 1 file from hdfs (hdfs blocks will be read sequentially from
> different datanodes) compared to reading it from a usual filesystems (which
> store file without splitting it). is it right?
> Thanks,
> Hassen
> Hassen,
> Read in hdfs is sequential, i.e. read one block after another. Each time the
> client will connect to one data node to read a block. Then connect to
> another (or the same) data node to read next block.
> The reason for this sequential design, I guess, is avoiding n/w traffic
> explosion in a heavy map reduce job.
> -Elton
> 2011/5/8 <stanley.shi@emc.com>
>> To my understanding, the reader read file blocks in parallel.
>> -----Original Message-----
>> From: Hassen Riahi [mailto:hassen.riahi@cern.ch]
>> Sent: 2011年5月7日 23:50
>> To: hdfs-user@hadoop.apache.org
>> Subject: Read files from hdfs
>> Hi all,
>> is the read operation of 1 file stored in hdfs done in parallel?
>> I mean let's say that I have 1 file split in 2 blocks (hdfs block) and
>> each block is stored in 1 rack.
>> When reading this file, both blocks are read in parallel? or the first
>> block is read and then once done the read of the second block begins?
>> If the later is right, the read of files in hdfs is then sequential.
>> is it right or am I missing something?
>> Thanks,
>> Hassen

Harsh J

View raw message