Return-Path: Delivered-To: apmail-hadoop-chukwa-user-archive@minotaur.apache.org Received: (qmail 9054 invoked from network); 17 Mar 2010 17:00:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Mar 2010 17:00:48 -0000 Received: (qmail 49275 invoked by uid 500); 17 Mar 2010 17:00:48 -0000 Delivered-To: apmail-hadoop-chukwa-user-archive@hadoop.apache.org Received: (qmail 48527 invoked by uid 500); 17 Mar 2010 17:00:47 -0000 Mailing-List: contact chukwa-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@hadoop.apache.org Delivered-To: mailing list chukwa-user@hadoop.apache.org Received: (qmail 48330 invoked by uid 99); 17 Mar 2010 17:00:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Mar 2010 17:00:47 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 74.125.82.176 is neither permitted nor denied by domain of oded@legolas-media.com) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Mar 2010 17:00:39 +0000 Received: by wyf19 with SMTP id 19so586945wyf.35 for ; Wed, 17 Mar 2010 10:00:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.85.19 with SMTP id t19mr680608wee.107.1268845218175; Wed, 17 Mar 2010 10:00:18 -0700 (PDT) Date: Wed, 17 Mar 2010 19:00:18 +0200 Message-ID: <1703587b1003171000x63f3ad7av37e9975b5347a2fe@mail.gmail.com> Subject: Hbase over Chukwa demux From: Oded Rosen To: hbase-user@hadoop.apache.org, chukwa-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6dab6f5811cbf0482020c47 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dab6f5811cbf0482020c47 Content-Type: text/plain; charset=ISO-8859-1 I work with a hadoop cluster with tons of new data each day. The data is flowing into hadoop from outside servers, using chukwa. Chukwa has a tool called demux, a builtin mapred job. Chukwa users may write their own map & reduce classes for this demux, with the only limitation that the input & output types are chukwa records - I cannot use HBase's TableMap, TableReduce. In order to write data to hbase during this mapred job, I can only use the table.put & table.commit, which work on one hbase raw only (aren't they?). This raised serious latency issues, as writing thousands of records to hbase this way every 5 minutes is not effective and really s-l-o-w. Even if I'll move the hbase writing from the map phase to the reduce phase, the same rows should be updated, so moving the ".put" to the reducer seems does not suppose to change anything. I would like to write straight to hbase from the chukwa demuxer, and not to have another job that reads the chukwa output and write it to hbase. The target is to have this data as fast as I can in hbase. Is there a way to write effectively to hbase without TableReduce? Have I got something wrong? is there someone using Chukwa that managed to do this thing? Thanks in advance for any kind of help, -- Oded --0016e6dab6f5811cbf0482020c47 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

I work with a hadoop cluster with tons of new data each day.

The = data is flowing into hadoop from outside servers, using chukwa.

<= br>Chukwa has a tool called demux, a builtin mapred job.

Chukwa users may write their own map & reduce classes for this demux, with the only limitation that the input & output types are chukwa records - I cannot use HBase's TableMap, TableReduce.

In order to write data to hbase during this mapred job, I can only use= the table.put & table.commit, which work on one hbase raw only (aren&#= 39;t they?).

This raised serious latency issues, as writing=A0tho= usands=A0of records to hbase this way every 5 minutes is not effective and = really s-l-o-w.

Even if I'll move the hbase writing from the map phase to the redu= ce phase, the same rows should be updated, so moving the ".put" to the reducer seems does not suppose to change anything.

I would like to write straight to hbase from the chukwa demuxer, = and not to have another job that reads the chukwa output and write it to hbase.
The target is to have this data as fast as I can in hbase.

Is there a way to write=A0effectively=A0to hbase = without TableReduce? Have I got something wrong?
is there someone using = Chukwa that managed to do this thing?

Thanks in advance for any kind of help,
--
Oded

--0016e6dab6f5811cbf0482020c47--