Return-Path: X-Original-To: apmail-incubator-chukwa-user-archive@www.apache.org Delivered-To: apmail-incubator-chukwa-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E02B26EF7 for ; Wed, 1 Jun 2011 20:34:28 +0000 (UTC) Received: (qmail 14727 invoked by uid 500); 1 Jun 2011 20:34:28 -0000 Delivered-To: apmail-incubator-chukwa-user-archive@incubator.apache.org Received: (qmail 14711 invoked by uid 500); 1 Jun 2011 20:34:28 -0000 Mailing-List: contact chukwa-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@incubator.apache.org Delivered-To: mailing list chukwa-user@incubator.apache.org Received: (qmail 14704 invoked by uid 99); 1 Jun 2011 20:34:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 20:34:28 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 20:34:21 +0000 Received: from SP2-EX07CAS02.ds.corp.yahoo.com (sp2-ex07cas02.corp.sp2.yahoo.com [98.137.59.38]) by mrout2-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p51KXT5G065013 for ; Wed, 1 Jun 2011 13:33:30 -0700 (PDT) Received: from SP2-EX07VS05.ds.corp.yahoo.com ([98.137.59.23]) by SP2-EX07CAS02.ds.corp.yahoo.com ([98.137.59.38]) with mapi; Wed, 1 Jun 2011 13:33:29 -0700 From: Eric Yang To: "chukwa-user@incubator.apache.org" Date: Wed, 1 Jun 2011 13:33:28 -0700 Subject: Re: speeding up demux Thread-Topic: speeding up demux Thread-Index: Acwglt9HObdqq7GCSrCZQm7a4eAl/wABEqBh Message-ID: In-Reply-To: <0F4124E6-8303-4A20-A356-D1D737D16C43@tynt.com> Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CA0BF02812414eyangyahooinccom_" MIME-Version: 1.0 --_000_CA0BF02812414eyangyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi James, 1) Trunk is most stable than any previous release, but it needs more docume= ntation. 2) Performance is the same for sequence file writer, and 200-300X faster da= ta availability, if the data is streamed to HBase. 3) Check out http://svn.apache.org/repos/asf/incubator/chukwa/trunk/CHANGES= .txt 4) Yes it does. Let us know if there is any questions. The setup instruction is located at: http://wiki.apache.org/hadoop/Chukwa_Q= uick_Start Hope it works for you. :) Regards, Eric On 6/1/11 1:01 PM, "James Seigel" wrote: Hello! I am seriously considering what you are suggesting in this email, even thou= gh it goes against what would seem to make sense. I have a couple of quest= ions if anyone has the time to answer. 1) How stable is trunk right now? 2) Any performance improvements/degredations since 0.3 3) Is there a pseudo change log between "trunk" and 0.4 that I could take a= peak at at this point 4) does it compile ;) Cheers and thanks for your time! James. On 2011-05-27, at 9:58 AM, Eric Yang wrote: I would recommend to skip Chukwa 0.4 and go to the trunk. In addition, use= HBaseWriter to stream data into HBase in parallel, hence, the data can be = processed in near real time for demux. Regards, Eric On 5/26/11 8:30 PM, "Bill Graham" > wrote: This seems possible, but one thing that would need to be changed is the dir= ectories that demux uses. For example: demuxProcessing/mrInput demuxProcessing/mrOutput These would need to dynamic directories with the timestamp or something els= e in them to keep two jobs from interfering with each other. On Thu, May 26, 2011 at 8:23 PM, Corbin Hoenes > wrote: Finding demux to be a bit too slow for our needs. It seems like only 1 run= s at a time; is there some technical reason why we couldn't run a couple in= parallel? If so any hints on how difficult it would be to run multiple de= muxers at a time? --_000_CA0BF02812414eyangyahooinccom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: speeding up demux Hi James,

1) Trunk is most stable than any previous release, but it needs more docume= ntation.
2) Performance is the same for sequence file writer, and 200-300X faster da= ta availability, if the data is streamed to HBase.
3) Check out http://svn.apache.org/repos/asf/incubator/chukwa/trunk/CHA= NGES.txt
4) Yes it does.  Let us know if there is any questions.

The setup instruction is located at: http://wiki.apache.org/hadoop/Chukwa_Quick_Start=

Hope it works for you. :)

Regards,
Eric

On 6/1/11 1:01 PM, "James Seigel" <= james@tynt.com> wrote:

Hello!

I am seriously considering what you are suggesting in this email, even thou= gh it goes against what would seem to make sense.  I have a couple of = questions if anyone has the time to answer.

1) How stable is trunk right now?
2) Any performance improvements/degredations since 0.3
3) Is there a pseudo change log between “trunk” and 0.4 that I = could take a peak at at this point
4) does it compile ;)

Cheers and thanks for your time!

James.


On 2011-05-27, at 9:58 AM, Eric Yang wrote:

I would recommend to skip Chukwa 0.4 and go= to the trunk.  In addition, use HBaseWriter to stream data into HBase= in parallel, hence, the data can be processed in near real time for demux.=

Regards,
Eric

On 5/26/11 8:30 PM, "Bill Graham" <billgraham@gmail.com <x-msg://109/billgraham@gmail.com> > wrote:

This seems possible, but one thing that wou= ld need to be changed is the directories that demux uses. For example:
demuxProcessing/mrInput
demuxProcessing/mrOutput

These would need t= o dynamic directories with the timestamp or something else in them to keep = two jobs from interfering with each other.

On Thu, May 26, 2011 at 8:23 PM, Corbin Hoenes <corbin@tynt.com <x-msg://109/c= orbin@tynt.com> > wrote:
Finding demux to be a bit too slow for our = needs.  It seems like only 1 runs at a time; is there some technical r= eason why we couldn't run a couple in parallel?  If so any hints on ho= w difficult it would be to run multiple demuxers at a time?






--_000_CA0BF02812414eyangyahooinccom_--