hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: hdsf block size cont.
Date Thu, 17 Mar 2011 16:14:11 GMT
Not in case of .gz files [Since there is no splitting done, the mapper
shall possibly read 128 MB locally from a resident DN, and then could
read the remaining 128 MB over the network from another DN if the next
block does not reside on the same DN as well -- thereby introducing a
network read cost].

On Thu, Mar 17, 2011 at 8:44 PM, Lior Schachter <liors@infolinks.com> wrote:
> yes. but with 128M gzip files/block size the M/R will work better ? no ?
>
> anyhow, thanks for the useful information.
>
> On Thu, Mar 17, 2011 at 5:07 PM, Harsh J <qwertymaniac@gmail.com> wrote:
>>
>> On Thu, Mar 17, 2011 at 7:51 PM, Lior Schachter <liors@infolinks.com>
>> wrote:
>> > Currently each gzip file is about 250MB (*60files=15G) so we have 256M
>> > blocks.
>>
>> Darn, I ought to sleep a bit more. I did a file/gb and read it as gb/file
>> mehh..
>>
>> >
>> > However I understand that in order to utilize better M/R parallel
>> > processing
>> > smaller files/blocks are better.
>>
>> Yes this is true in case of text/sequence files.
>>
>> > So maybe having 128M gzip files with coreesponding 128M block size would
>> > be
>> > better?
>>
>> Why not 256 for all your ~250MB _gzip_ files, making it nearly one
>> block since they would not be split anyways?
>>
>> --
>> Harsh J
>> http://harshj.com
>
>



-- 
Harsh J
http://harshj.com

Mime
View raw message