hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elia Mazzawi <elia.mazz...@casalemedia.com>
Subject Re: merging into MapFile
Date Tue, 09 Dec 2008 22:21:29 GMT
it has to do with the data block size,

I had many small files and the performance because much better when i 
merged them,

the default block size is 64Mb so redo your files to <= 64MB (what i did 
and recommend)
or reconfigure your hadoop.

<property>
  <name>dfs.block.size</name>
  <value>67108864</value>
  <description>The default block size for new files.</description>
</property>

do something like
cat * | rotatelogs ./merged/m 64M
it will merge and chop the data for you.

yoav.morag wrote:
> hi all -
> can anyone comment on the performance cost of merging many small files into
> an increasingly large MapFile ? will that cost be dependent on the size of
> the larger MapFile (since I have to rewrite it) or is there a built-in
> strategy to split it into smaller parts, affecting only those which were
> touched ? 
> thanks -
> Yoav.
>   


Mime
View raw message