hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oded Rosen <o...@legolas-media.com>
Subject Re: Hbase over Chukwa demux
Date Wed, 17 Mar 2010 17:31:17 GMT
Well, the best solution for my case will be a demux process that will output
both a chukwa record (a regular text/sequence file will be even better) AND
to hbase (so multiple output format will be great for me).
Also, the hbase writer should query the current data in the hbase (read the
same row it will update) to use it as a reference for the update.

If those two things will work, I'll be a happy man.

On Wed, Mar 17, 2010 at 7:12 PM, Jerome Boulon <jboulon@netflix.com> wrote:

>  Hi,
> I have a new Demux that use something similar to MultipleOutputFormat and
> one of my output is an Hive SeqFile (directly from Demux).
> So I guess that it should not be difficult to get a specific OutputFormat
> for Hbase.
> Do you have any special requirement other than being able to output to
> HBase?
>
> /Jerome.
>
>
> On 3/17/10 10:00 AM, "Oded Rosen" <oded@legolas-media.com> wrote:
>
> I work with a hadoop cluster with tons of new data each day.
> The data is flowing into hadoop from outside servers, using chukwa.
>
> Chukwa has a tool called demux, a builtin mapred job.
> Chukwa users may write their own map & reduce classes for this demux, with
> the only limitation that the input & output types are chukwa records - I
> cannot use HBase's TableMap, TableReduce.
> In order to write data to hbase during this mapred job, I can only use the
> table.put & table.commit, which work on one hbase raw only (aren't they?).
> This raised serious latency issues, as writing thousands of records to
> hbase this way every 5 minutes is not effective and really s-l-o-w.
> Even if I'll move the hbase writing from the map phase to the reduce phase,
> the same rows should be updated, so moving the ".put" to the reducer seems
> does not suppose to change anything.
>
> I would like to write straight to hbase from the chukwa demuxer, and not to
> have another job that reads the chukwa output and write it to hbase.
> The target is to have this data as fast as I can in hbase.
>
> Is there a way to write effectively to hbase without TableReduce? Have I
> got something wrong?
> is there someone using Chukwa that managed to do this thing?
>
>
> Thanks in advance for any kind of help,
>
>


-- 
Oded

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message