hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: hdsf block size cont.
Date Thu, 17 Mar 2011 15:07:00 GMT
On Thu, Mar 17, 2011 at 7:51 PM, Lior Schachter <liors@infolinks.com> wrote:
> Currently each gzip file is about 250MB (*60files=15G) so we have 256M
> blocks.

Darn, I ought to sleep a bit more. I did a file/gb and read it as gb/file mehh..

> However I understand that in order to utilize better M/R parallel processing
> smaller files/blocks are better.

Yes this is true in case of text/sequence files.

> So maybe having 128M gzip files with coreesponding 128M block size would be
> better?

Why not 256 for all your ~250MB _gzip_ files, making it nearly one
block since they would not be split anyways?

Harsh J

View raw message