Return-Path: Delivered-To: apmail-hadoop-chukwa-user-archive@minotaur.apache.org Received: (qmail 27238 invoked from network); 17 Mar 2010 17:55:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Mar 2010 17:55:54 -0000 Received: (qmail 43378 invoked by uid 500); 17 Mar 2010 17:55:54 -0000 Delivered-To: apmail-hadoop-chukwa-user-archive@hadoop.apache.org Received: (qmail 43361 invoked by uid 500); 17 Mar 2010 17:55:54 -0000 Mailing-List: contact chukwa-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@hadoop.apache.org Delivered-To: mailing list chukwa-user@hadoop.apache.org Received: (qmail 43353 invoked by uid 99); 17 Mar 2010 17:55:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Mar 2010 17:55:54 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Mar 2010 17:55:46 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id o2HHt6rr066608 for ; Wed, 17 Mar 2010 10:55:06 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:x-originalarrivaltime; b=h6ff0VNV8+mzC3mB5gKXakw928RweotOVPTtT5fWQQHLQYF0OXCj2DS1d2/Swpuj Received: from SNV-EXVS06.ds.corp.yahoo.com ([207.126.227.234]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 17 Mar 2010 10:55:06 -0700 Received: from 10.72.111.153 ([10.72.111.153]) by SNV-EXVS06.ds.corp.yahoo.com ([207.126.227.82]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.60]) with Microsoft Exchange Server HTTP-DAV ; Wed, 17 Mar 2010 17:54:33 +0000 User-Agent: Microsoft-Entourage/12.24.0.100205 Date: Wed, 17 Mar 2010 10:54:32 -0700 Subject: Re: Hbase over Chukwa demux From: Eric Yang To: Message-ID: Thread-Topic: Hbase over Chukwa demux Thread-Index: AcrF+uW3EYZp4YsPeEmxNQleeEkASg== In-Reply-To: <1703587b1003171000x63f3ad7av37e9975b5347a2fe@mail.gmail.com> Mime-version: 1.0 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable X-OriginalArrivalTime: 17 Mar 2010 17:55:06.0444 (UTC) FILETIME=[FA3F78C0:01CAC5FA] X-Virus-Checked: Checked by ClamAV on apache.org Hi Oded, Current Chukwa Demux uses one reducer per record type for output. It depends on your data model. It may be worth while to look into running multiple reducer per recordtype, if your data has a lot of record for a single data type. I think the conf.setNumReduceTasks is specified in org.apache.hadoop.chukwa.extraction.demux.Demux.java. You can set more if you don't use ChukwaRecord after demux. The current demux needs some major update to improve, and patches are welcome. :) Regards, Eric On 3/17/10 10:00 AM, "Oded Rosen" wrote: > I work with a hadoop cluster with tons of new data each day. > The data is flowing into hadoop from outside servers, using chukwa. >=20 > Chukwa has a tool called demux, a builtin mapred job. > Chukwa users may write their own map & reduce classes for this demux, wit= h the > only limitation that the input & output types are chukwa records - I cann= ot > use HBase's TableMap, TableReduce. > In order to write data to hbase during this mapred job, I can only use th= e > table.put & table.commit, which work on one hbase raw only (aren't they?)= . > This raised serious latency issues, as writing=A0thousands=A0of records to hb= ase > this way every 5 minutes is not effective and really s-l-o-w. > Even if I'll move the hbase writing from the map phase to the reduce phas= e, > the same rows should be updated, so moving the ".put" to the reducer seem= s > does not suppose to change anything. >=20 > I would like to write straight to hbase from the chukwa demuxer, and not = to > have another job that reads the chukwa output and write it to hbase. > The target is to have this data as fast as I can in hbase. >=20 > Is there a way to write=A0effectively=A0to hbase without TableReduce? Have I = got > something wrong? > is there someone using Chukwa that managed to do this thing? >=20 >=20 > Thanks in advance for any kind of help,