incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <>
Subject Re: 0.3.0 config file
Date Mon, 03 May 2010 15:21:06 GMT

We are reprocessing from archive files.   We've played with how many files to move into the
/chukwa/logs directory to get better speed and control the number of mappers the job uses--I
wondered if this was configurable (how many .done files does the demux job look for before
starting a job).  For now we just move 40 new ones in when there are less than 10 in the logs

Our reducer is just the default reducer.  That we get from AbstractProcessor.

On the config file...I pasted the literal file from my setup.  So what is the <value>@TODO-DEMUX-IO-SORT-MB@</value>
supposed to do?

Should I replace that value with my own value like 128 for 128MB?  Or should I be replacing
the value if I wanted to change this value?

On May 2, 2010, at 11:57 AM, Eric Yang wrote:

> Are you reprocessing from archive file?  The number of mappers are mapped to
> the number of files that you have.  Hence, having small files would surely
> slow things down quite a bit.  The parameters are also depending on your
> hardware.  The default settings in are used to generate
> chukwa-demux-conf.xml.  It was setup for 4GB machine for your task trackers.
> You may want to increase the numbers, if you have more ram.
> I hope your reducers don't write out a lot of data, this is currently
> partitioned by data type.  Hence, it may take a long time to write the final
> output, if the reducers need to output TB of data.  I filed a jira for
> improving this a while ago:
> Hope this helps.
> Regards,
> Eric
> On 5/2/10 6:51 AM, "Corbin Hoenes" <> wrote:
>> I'm reprocessing a bunch of data ~45 days ~70GB per day.  It's taking a
>> there some configuration that might help demux perform better when
>> it's fed a lot of files?  I've noticed sort takes a long time when it's got
>> too many maps.  Can I lower the amount of maps, etc...?
>> I saw this in the config but noticed the TODO comments.  Anything here I
>> should configure?
>> <!-- Chukwa Job parameters -->
>> <property>
>> <name>io.sort.mb</name>
>> <value>@TODO-DEMUX-IO-SORT-MB@</value>
>> <description>The total amount of buffer memory to use while sorting
>> files, in megabytes.  By default, gives each merge stream 1MB, which
>> should minimize seeks.</description>
>> </property>
>> <property>
>> <name>fs.inmemory.size.mb</name>
>> <value>@TODO-DEMUX-FS-INMEMORY-SIZE_MB@</value>
>> <description>The size of the in-memory filsystem instance in MB</description>
>> </property>
>> <property>
>> <name>io.sort.factor</name>
>> <value>@TODO-DEMUX-IO-SORT-FACTOR@</value>
>> <description>The number of streams to merge at once while sorting
>> files.  This determines the number of open file handles.</description>
>> </property>

View raw message