hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hassen Riahi <hassen.ri...@cern.ch>
Subject Re: Read files from hdfs
Date Wed, 11 May 2011 21:58:03 GMT
Thank you Elton and Stanley for your reply.

Given that we are not running map reduce jobs (at least until now) +  
assuming that the read is sequential + in case where the network is  
not heavily used, I'll wait to see in general a degradation of  
performance when reading 1 file from hdfs (hdfs blocks will be read  
sequentially from different datanodes) compared to reading it from a  
usual filesystems (which store file without splitting it). is it right?


> Hassen,
> Read in hdfs is sequential, i.e. read one block after another. Each  
> time the client will connect to one data node to read a block. Then  
> connect to another (or the same) data node to read next block.
> The reason for this sequential design, I guess, is avoiding n/w  
> traffic explosion in a heavy map reduce job.
> -Elton
> 2011/5/8 <stanley.shi@emc.com>
> To my understanding, the reader read file blocks in parallel.
> -----Original Message-----
> From: Hassen Riahi [mailto:hassen.riahi@cern.ch]
> Sent: 2011年5月7日 23:50
> To: hdfs-user@hadoop.apache.org
> Subject: Read files from hdfs
> Hi all,
> is the read operation of 1 file stored in hdfs done in parallel?
> I mean let's say that I have 1 file split in 2 blocks (hdfs block) and
> each block is stored in 1 rack.
> When reading this file, both blocks are read in parallel? or the first
> block is read and then once done the read of the second block begins?
> If the later is right, the read of files in hdfs is then sequential.
> is it right or am I missing something?
> Thanks,
> Hassen

View raw message