incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Boulon <jbou...@netflix.com>
Subject Re: Hbase over Chukwa demux
Date Wed, 17 Mar 2010 17:12:56 GMT
Hi,
I have a new Demux that use something similar to MultipleOutputFormat and
one of my output is an Hive SeqFile (directly from Demux).
So I guess that it should not be difficult to get a specific OutputFormat
for Hbase.
Do you have any special requirement other than being able to output to
HBase?

/Jerome.

On 3/17/10 10:00 AM, "Oded Rosen" <oded@legolas-media.com> wrote:

> I work with a hadoop cluster with tons of new data each day.
> The data is flowing into hadoop from outside servers, using chukwa.
> 
> Chukwa has a tool called demux, a builtin mapred job.
> Chukwa users may write their own map & reduce classes for this demux, with the
> only limitation that the input & output types are chukwa records - I cannot
> use HBase's TableMap, TableReduce.
> In order to write data to hbase during this mapred job, I can only use the
> table.put & table.commit, which work on one hbase raw only (aren't they?).
> This raised serious latency issues, as writing thousands of records to hbase
> this way every 5 minutes is not effective and really s-l-o-w.
> Even if I'll move the hbase writing from the map phase to the reduce phase,
> the same rows should be updated, so moving the ".put" to the reducer seems
> does not suppose to change anything.
> 
> I would like to write straight to hbase from the chukwa demuxer, and not to
> have another job that reads the chukwa output and write it to hbase.
> The target is to have this data as fast as I can in hbase.
> 
> Is there a way to write effectively to hbase without TableReduce? Have I got
> something wrong?
> is there someone using Chukwa that managed to do this thing?
> 
> 
> Thanks in advance for any kind of help,


Mime
View raw message