hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elia Mazzawi <elia.mazz...@casalemedia.com>
Subject Re: merging into MapFile
Date Tue, 09 Dec 2008 22:21:29 GMT
it has to do with the data block size,

I had many small files and the performance because much better when i 
merged them,

the default block size is 64Mb so redo your files to <= 64MB (what i did 
and recommend)
or reconfigure your hadoop.

  <description>The default block size for new files.</description>

do something like
cat * | rotatelogs ./merged/m 64M
it will merge and chop the data for you.

yoav.morag wrote:
> hi all -
> can anyone comment on the performance cost of merging many small files into
> an increasingly large MapFile ? will that cost be dependent on the size of
> the larger MapFile (since I have to rewrite it) or is there a built-in
> strategy to split it into smaller parts, affecting only those which were
> touched ? 
> thanks -
> Yoav.

View raw message