Return-Path: Delivered-To: apmail-hadoop-chukwa-user-archive@minotaur.apache.org Received: (qmail 37168 invoked from network); 20 Jan 2010 00:53:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jan 2010 00:53:36 -0000 Received: (qmail 63167 invoked by uid 500); 20 Jan 2010 00:53:36 -0000 Delivered-To: apmail-hadoop-chukwa-user-archive@hadoop.apache.org Received: (qmail 63129 invoked by uid 500); 20 Jan 2010 00:53:36 -0000 Mailing-List: contact chukwa-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@hadoop.apache.org Delivered-To: mailing list chukwa-user@hadoop.apache.org Received: (qmail 63120 invoked by uid 99); 20 Jan 2010 00:53:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 00:53:36 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of billgraham@gmail.com designates 209.85.216.204 as permitted sender) Received: from [209.85.216.204] (HELO mail-px0-f204.google.com) (209.85.216.204) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 00:53:29 +0000 Received: by pxi42 with SMTP id 42so5891756pxi.5 for ; Tue, 19 Jan 2010 16:53:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=WL89OttLzX++NrO9qSEPdepl0rkNIabEOCcY/eDXoXo=; b=ScqvDh22DHsgkzwP13P+Q8vSttPeGFMrClGEX/id+6y7SCfE2xo7xDOB5Qbm62IFQk rSLEtcdSxlUinB/doEu3CsAvbJVM2oA/FdfN9D/CExBMM/EjHyaqZADikiGO+lOQ+7PV YkESGpyEzywb3V+l20tFk5lHlzSF3wMprAQuw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=FEN3SbpjyjXr/JEb6H52ex4XLaTYTN+/PopnmZTf416eL44RTC8BEHbq1EiBf0/lSu dQbhJX0d9IvJrs7+3IPrVZVDaMNDKAAF0hLW9wnGUr1aGqInFFRY63k72iZu0EopVEYY QLbpySGBMb6MBEQF/o8F7d3W/1cplMO7BXEQ0= MIME-Version: 1.0 Received: by 10.142.7.30 with SMTP id 30mr738614wfg.321.1263948789205; Tue, 19 Jan 2010 16:53:09 -0800 (PST) Reply-To: billgraham@gmail.com In-Reply-To: <39b0afc01001191559v6d868970se41252b2d06f337f@mail.gmail.com> References: <449b48761001191546s7988ca21s50f7f995b68f6ce5@mail.gmail.com> <39b0afc01001191559v6d868970se41252b2d06f337f@mail.gmail.com> Date: Tue, 19 Jan 2010 16:53:09 -0800 Message-ID: <449b48761001191653j509b8523q259b1974a0bca0c@mail.gmail.com> Subject: Re: how to generate a Chukwa SequenceFile From: Bill Graham To: Ariel Rabkin Cc: chukwa-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00504502ae2e984d38047d8e0280 --00504502ae2e984d38047d8e0280 Content-Type: text/plain; charset=ISO-8859-1 Thanks Ari, that helps. The TempFileUtil.writeASinkFile method seems similar to what I want actually. >From looking at the code though it seems that a sink file contains ChukwaArchiveKey -> ChunkImpl key value pairs, but a processed file instead contains ChukwaRecordKey -> ChukwaRecord pairs. If I followed that code as an example, but just created the latter k/v pairs instead of the former I'd be good to go, correct? On Tue, Jan 19, 2010 at 3:59 PM, Ariel Rabkin wrote: > There isn't a polished utility for this, and there should be. I think > it'll be entirely straightforward, depending on your specific > requirements. > > If you look in org.apache.hadoop.chukwa.util.TempFileUtil.RandSeqFileWriter > there's an example of code that writes out a sequence file for test > purposes. > > --Ari > > On Tue, Jan 19, 2010 at 3:46 PM, Bill Graham wrote: > > Hi, > > > > Is there an easy way (maybe using a utility class or the chukwa API) to > > manually create a sequence file of chukwa records from a log file without > > the need for HDFS? > > > > My use case is this: I've got pig unit tests that read input sequence > file > > input using ChukwaStorage from local disk. I generated these files by > > putting data into the cluster an waiting for the data processor to run. > > We're looking to change the log format though, and I'd like to be able to > > write and run the unit tests without putting the new data into the > cluster. > > > > If there were a command line way that I could do this that would be very > > helpful. Or if anyone could point me to the relevant classes, I could > write > > such a utility and contribute it back. > > > > thanks, > > Bill > > > > > > -- > Ari Rabkin asrabkin@gmail.com > UC Berkeley Computer Science Department > --00504502ae2e984d38047d8e0280 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Ari, that helps. The TempFileUtil.writeASinkFile method seems simila= r to what I want actually.

From looking at the code though it seems = that a sink file contains ChukwaArchiveKey -> ChunkImpl key value pairs,= but a processed file instead contains ChukwaRecordKey -> ChukwaRecord p= airs.

If I followed that code as an example, but just created the latter k/v = pairs instead of the former I'd be good to go, correct?


On Tue, Jan 19, 2010 at 3:59 PM, Ariel Rabkin <asrabkin@gmail.com> wrote:
There isn't a= polished utility for this, and there should be. =A0I think
it'll be entirely straightforward, depending on your specific
requirements.

If you look in org.apache.hadoop.chukwa.util.TempFileUtil.RandSeqFileWriter=
there's an example of code that writes out a sequence file for test
purposes.

--Ari

On Tue, Jan 19, 2010 at 3:46 PM, Bill Graham <
billgraham@gmail.com> wrote:
> Hi,
>
> Is there an easy way (maybe using a utility class or the chukwa API) t= o
> manually create a sequence file of chukwa records from a log file with= out
> the need for HDFS?
>
> My use case is this: I've got pig unit tests that read input seque= nce file
> input using ChukwaStorage from local disk. I generated these files by<= br> > putting data into the cluster an waiting for the data processor to run= .
> We're looking to change the log format though, and I'd like to= be able to
> write and run the unit tests without putting the new data into the clu= ster.
>
> If there were a command line way that I could do this that would be ve= ry
> helpful. Or if anyone could point me to the relevant classes, I could = write
> such a utility and contribute it back.
>
> thanks,
> Bill
>



--
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

--00504502ae2e984d38047d8e0280--