incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: 0.3.0 config file
Date Sun, 02 May 2010 17:57:49 GMT
Are you reprocessing from archive file?  The number of mappers are mapped to
the number of files that you have.  Hence, having small files would surely
slow things down quite a bit.  The parameters are also depending on your
hardware.  The default settings in default.properties are used to generate
chukwa-demux-conf.xml.  It was setup for 4GB machine for your task trackers.
You may want to increase the numbers, if you have more ram.

I hope your reducers don't write out a lot of data, this is currently
partitioned by data type.  Hence, it may take a long time to write the final
output, if the reducers need to output TB of data.  I filed a jira for
improving this a while ago:

https://issues.apache.org/jira/browse/CHUKWA-481

Hope this helps.

Regards,
Eric

On 5/2/10 6:51 AM, "Corbin Hoenes" <corbin@tynt.com> wrote:

> I'm reprocessing a bunch of data ~45 days ~70GB per day.  It's taking a
> while...is there some configuration that might help demux perform better when
> it's fed a lot of files?  I've noticed sort takes a long time when it's got
> too many maps.  Can I lower the amount of maps, etc...?
> 
> I saw this in the config but noticed the TODO comments.  Anything here I
> should configure?
> 
> <!-- Chukwa Job parameters -->
> <property>
>  <name>io.sort.mb</name>
>  <value>@TODO-DEMUX-IO-SORT-MB@</value>
>  <description>The total amount of buffer memory to use while sorting
>  files, in megabytes.  By default, gives each merge stream 1MB, which
>  should minimize seeks.</description>
> </property>
> 
> <property>
>  <name>fs.inmemory.size.mb</name>
>  <value>@TODO-DEMUX-FS-INMEMORY-SIZE_MB@</value>
>  <description>The size of the in-memory filsystem instance in MB</description>
> </property>
> 
> <property>
>  <name>io.sort.factor</name>
>  <value>@TODO-DEMUX-IO-SORT-FACTOR@</value>
>  <description>The number of streams to merge at once while sorting
>  files.  This determines the number of open file handles.</description>
> </property>
> 


Mime
View raw message