chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerome Boulon (JIRA)" <>
Subject [jira] Commented: (CHUKWA-30) Remove HDFS flush & connection holding (Collector)
Date Thu, 14 May 2009 16:28:45 GMT


Jerome Boulon commented on CHUKWA-30:

Unit testing the collector is difficult since we don't have a end-to-end testing tools but
this is something we are going to work on.
That been said, this code is running for one week now collecting System metrics from 3700

What do you mean by "add the necessary/optional conf options to chukwa-collector-conf.xml.template"?
Activate the new Writer in place of the current one or just add all properties but comment
the xml block?

We had to remove the 10 seconds lock (hdfs flush) for performance reason.
Then the reason of writing to local first is because local file system tend to be more reliable
than writing cross network and because we have a use case where people want to use the DataCollection
pipeline but without HDFS at all.

This give me a 10X improvement compare to the default writer. In order to collect System Metrics
from 3700 machines I had to have 5 collectors running and data was still late.
With the new collector with only one instance running, I've been able to handle all SM for
all machines from a single collector.
Also, Demux is more efficient since I have fewer and bigger dataSink files.

> Remove HDFS flush & connection holding (Collector)
> --------------------------------------------------
>                 Key: CHUKWA-30
>                 URL:
>             Project: Hadoop Chukwa
>          Issue Type: Improvement
>          Components: data collection
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: CHUKWA-30.patch

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message