incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oded Rosen <>
Subject Hbase over Chukwa demux
Date Wed, 17 Mar 2010 17:00:18 GMT
 I work with a hadoop cluster with tons of new data each day.
The data is flowing into hadoop from outside servers, using chukwa.

Chukwa has a tool called demux, a builtin mapred job.
Chukwa users may write their own map & reduce classes for this demux, with
the only limitation that the input & output types are chukwa records - I
cannot use HBase's TableMap, TableReduce.
In order to write data to hbase during this mapred job, I can only use the
table.put & table.commit, which work on one hbase raw only (aren't they?).
This raised serious latency issues, as writing thousands of records to hbase
this way every 5 minutes is not effective and really s-l-o-w.
Even if I'll move the hbase writing from the map phase to the reduce phase,
the same rows should be updated, so moving the ".put" to the reducer seems
does not suppose to change anything.

I would like to write straight to hbase from the chukwa demuxer, and not to
have another job that reads the chukwa output and write it to hbase.
The target is to have this data as fast as I can in hbase.

Is there a way to write effectively to hbase without TableReduce? Have I got
something wrong?
is there someone using Chukwa that managed to do this thing?

Thanks in advance for any kind of help,

View raw message