hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hassen Riahi <hassen.ri...@cern.ch>
Subject Re: Read files from hdfs
Date Thu, 12 May 2011 11:08:28 GMT
Thanks for the reply.

Maybe I was not clear enough when explaining the use-case...Sorry for  


1- we are not running map reduce jobs
2- the read from hdfs is sequential
3- the network is not heavily used

I want to read 1 file remotely from a distributed filesystem, I have 2  

1- reading it from HDFS
2- reading it from a usual distributed filesystem (which have stored  
the file in the same machine, rather splitting it in blocks and then  
distribute them as hdfs did)

1 could get slower than 2 since 1 is introducing more overhead than 2  
(at each new hdfs block to read, it is needed to establish the  
connexion with the datanode containing this block...)

Is it right?


> Yes it could get slower cause the operation would now involve a disk
> read AND a network transfer (with other little overheads it carries
> along).
> 2011/5/12 Hassen Riahi <hassen.riahi@cern.ch>:
>> Thank you Elton and Stanley for your reply.
>> Given that we are not running map reduce jobs (at least until now) +
>> assuming that the read is sequential + in case where the network is  
>> not
>> heavily used, I'll wait to see in general a degradation of  
>> performance when
>> reading 1 file from hdfs (hdfs blocks will be read sequentially from
>> different datanodes) compared to reading it from a usual  
>> filesystems (which
>> store file without splitting it). is it right?
>> Thanks,
>> Hassen
>> Hassen,
>> Read in hdfs is sequential, i.e. read one block after another. Each  
>> time the
>> client will connect to one data node to read a block. Then connect to
>> another (or the same) data node to read next block.
>> The reason for this sequential design, I guess, is avoiding n/w  
>> traffic
>> explosion in a heavy map reduce job.
>> -Elton
>> 2011/5/8 <stanley.shi@emc.com>
>>> To my understanding, the reader read file blocks in parallel.
>>> -----Original Message-----
>>> From: Hassen Riahi [mailto:hassen.riahi@cern.ch]
>>> Sent: 2011年5月7日 23:50
>>> To: hdfs-user@hadoop.apache.org
>>> Subject: Read files from hdfs
>>> Hi all,
>>> is the read operation of 1 file stored in hdfs done in parallel?
>>> I mean let's say that I have 1 file split in 2 blocks (hdfs block)  
>>> and
>>> each block is stored in 1 rack.
>>> When reading this file, both blocks are read in parallel? or the  
>>> first
>>> block is read and then once done the read of the second block  
>>> begins?
>>> If the later is right, the read of files in hdfs is then sequential.
>>> is it right or am I missing something?
>>> Thanks,
>>> Hassen
> -- 
> Harsh J

View raw message