Return-Path: Delivered-To: apmail-hadoop-chukwa-user-archive@minotaur.apache.org Received: (qmail 47742 invoked from network); 1 Mar 2010 08:13:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Mar 2010 08:13:59 -0000 Received: (qmail 50674 invoked by uid 500); 28 Feb 2010 10:27:18 -0000 Delivered-To: apmail-hadoop-chukwa-user-archive@hadoop.apache.org Received: (qmail 50650 invoked by uid 500); 28 Feb 2010 10:27:18 -0000 Mailing-List: contact chukwa-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@hadoop.apache.org Delivered-To: mailing list chukwa-user@hadoop.apache.org Received: (qmail 50642 invoked by uid 99); 28 Feb 2010 10:27:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Feb 2010 10:27:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bisho@tuenti.com designates 95.131.168.251 as permitted sender) Received: from [95.131.168.251] (HELO calipso.tuenti.com) (95.131.168.251) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Feb 2010 10:27:11 +0000 Received: from localhost (localhost [127.0.0.1]) by calipso.tuenti.com ((8.14.2/8.14.2)) with ESMTP id 757BF1801B8E for ; Sun, 28 Feb 2010 11:26:49 +0100 (CET) Received: from mail-bw0-f211.google.com (mail-bw0-f211.google.com [209.85.218.211]) by calipso.tuenti.com ((8.14.2/8.14.2)) with ESMTP id C9F6519512DB for ; Sun, 28 Feb 2010 11:26:45 +0100 (CET) Received: by bwz3 with SMTP id 3so1063499bwz.29 for ; Sun, 28 Feb 2010 02:26:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.3.216 with SMTP id 24mr2043908bko.30.1267352805190; Sun, 28 Feb 2010 02:26:45 -0800 (PST) In-Reply-To: References: <9d1281711002260443w393761b4u6c6b34865dda8cc3@mail.gmail.com> From: =?UTF-8?Q?Guillermo_P=C3=A9rez?= Date: Sun, 28 Feb 2010 11:26:25 +0100 Message-ID: <9d1281711002280226h1616e912i2d678b30a04eda7c@mail.gmail.com> Subject: Re: Directly create chukwa records? To: chukwa-user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable >> One related thing is that I want to modify the "cluster" where we put >> the files, because we will receive syslog data with several types of >> events that we want to store in different clusters to analyze, backup, >> archive separately. I have seen that you can modify the >> Record.tagsField and that we use a regexp for extracting the >> destination cluster. This is a bit akward, isn't? I don't want to keep >> a tagsField just for that. I'm using a field "event_type" and I have >> modified the extraction/engine/RecordUtil.java, so if that field >> exists, "event_" + will be used as cluster. This is the >> proper way to go, or there is a better solution for this?. > > I don't think you need to modify RecordUtil.java for this purpose. =C2=A0= The > backfill java program is taking first parameter as cluster. =C2=A0Hence, = you > could easily change event_type as the first parameter before you backfill= . Nor really. The backfill uses the given cluster to populate the Chunk data and then is inserted in the Record.tagsField. If that tagsField is not present, the cluster used in "unknown". And I'm collecting data from different types, page views, bussiness actions, that should be stored in different clusters, one log loaded with backfill may contain more than one cluster as destination. That's why I modified the RecordUtil to tweak how to decide the destination cluster based on some fields of my ChukwaRecords. >> Another question is where I could start looking on how to build >> reports and aggregated results of the custom ChukwaRecords I'm >> inserting. > > There is currently no formal solution to generate report from ChukwaRecor= ds. > There is org.apache.hadoop.chukwa.dataloader.MetricDataLoader which loads > ChukwaRecords into mysql database base on mdl.xml file. =C2=A0After data = is > loaded, you could use hicc.sh to start the webserver, and visualize the d= ata > in Chukwa SQL Client widget. =C2=A0However, I must warn you that MetricDa= taLoader > is deprecated, and the future plan to generate report from ChukwaRecords = is > as follow: > > Having a post demux data loader which wait to receive new ChukwaRecords > files, and merge with the existing ChukwaRecords files through a second M= R > job. =C2=A0The second MR job also produces low resolution of the data for= report. > > /chukwa/repos/TYPE/DATE <-- Original data goes here. > /chukwa/report/TYPE/[yearly,monthly,weekly,daily] <-- Summarized JSON dat= a > goes here. > > The report JSON will be fixed to 300 data points per series, optimized fo= r > graphing. =C2=A0I am taking it slow on the actual implementation because > ChukwaRecords should be move to a faster seralization format. =C2=A0It's = another > area that needs to be improved for the future plan to work. Hum, really interesting! I will keep an eye on that, thanks a lot. Meanwhile I will investigate how to build reports with pig and dump them to a mysql server. There is any doc on how HICC does his own aggregations? I can plug new things there easily? > Regards, > Eric > > --=20 Guille -=E2=84=AC=E1=B8=AD=E1=B9=A9=E1=B8=A9=C3=B8- :wq