incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: Hbase over Chukwa demux
Date Wed, 17 Mar 2010 17:54:32 GMT
Hi Oded,

Current Chukwa Demux uses one reducer per record type for output.  It
depends on your data model.  It may be worth while to look into running
multiple reducer per recordtype, if your data has a lot of record for a
single data type.  I think the conf.setNumReduceTasks is specified in  You can set more if
you don't use ChukwaRecord after demux.

The current demux needs some major update to improve, and patches are
welcome.  :)


On 3/17/10 10:00 AM, "Oded Rosen" <> wrote:

> I work with a hadoop cluster with tons of new data each day.
> The data is flowing into hadoop from outside servers, using chukwa.
> Chukwa has a tool called demux, a builtin mapred job.
> Chukwa users may write their own map & reduce classes for this demux, with the
> only limitation that the input & output types are chukwa records - I cannot
> use HBase's TableMap, TableReduce.
> In order to write data to hbase during this mapred job, I can only use the
> table.put & table.commit, which work on one hbase raw only (aren't they?).
> This raised serious latency issues, as writing thousands of records to hbase
> this way every 5 minutes is not effective and really s-l-o-w.
> Even if I'll move the hbase writing from the map phase to the reduce phase,
> the same rows should be updated, so moving the ".put" to the reducer seems
> does not suppose to change anything.
> I would like to write straight to hbase from the chukwa demuxer, and not to
> have another job that reads the chukwa output and write it to hbase.
> The target is to have this data as fast as I can in hbase.
> Is there a way to write effectively to hbase without TableReduce? Have I got
> something wrong?
> is there someone using Chukwa that managed to do this thing?
> Thanks in advance for any kind of help,

View raw message