incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: how to generate a Chukwa SequenceFile
Date Fri, 22 Jan 2010 00:01:03 GMT
Here's a JIRA with a patch. Let me know if you think I should refactor any
parts of it:

https://issues.apache.org/jira/browse/CHUKWA-449

On Tue, Jan 19, 2010 at 6:03 PM, Ariel Rabkin <asrabkin@gmail.com> wrote:

> Yes, if by processing you mean "demux".  Which should be renamed, I
> think, at some point.
>
> --Ari
>
> On Tue, Jan 19, 2010 at 4:53 PM, Bill Graham <billgraham@gmail.com> wrote:
> > Thanks Ari, that helps. The TempFileUtil.writeASinkFile method seems
> similar
> > to what I want actually.
> >
> > From looking at the code though it seems that a sink file contains
> > ChukwaArchiveKey -> ChunkImpl key value pairs, but a processed file
> instead
> > contains ChukwaRecordKey -> ChukwaRecord pairs.
> >
> > If I followed that code as an example, but just created the latter k/v
> pairs
> > instead of the former I'd be good to go, correct?
> >
> >
> > On Tue, Jan 19, 2010 at 3:59 PM, Ariel Rabkin <asrabkin@gmail.com>
> wrote:
> >>
> >> There isn't a polished utility for this, and there should be.  I think
> >> it'll be entirely straightforward, depending on your specific
> >> requirements.
> >>
> >> If you look in
> >> org.apache.hadoop.chukwa.util.TempFileUtil.RandSeqFileWriter
> >> there's an example of code that writes out a sequence file for test
> >> purposes.
> >>
> >> --Ari
> >>
> >> On Tue, Jan 19, 2010 at 3:46 PM, Bill Graham <billgraham@gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > Is there an easy way (maybe using a utility class or the chukwa API)
> to
> >> > manually create a sequence file of chukwa records from a log file
> >> > without
> >> > the need for HDFS?
> >> >
> >> > My use case is this: I've got pig unit tests that read input sequence
> >> > file
> >> > input using ChukwaStorage from local disk. I generated these files by
> >> > putting data into the cluster an waiting for the data processor to
> run.
> >> > We're looking to change the log format though, and I'd like to be able
> >> > to
> >> > write and run the unit tests without putting the new data into the
> >> > cluster.
> >> >
> >> > If there were a command line way that I could do this that would be
> very
> >> > helpful. Or if anyone could point me to the relevant classes, I could
> >> > write
> >> > such a utility and contribute it back.
> >> >
> >> > thanks,
> >> > Bill
> >> >
> >>
> >>
> >>
> >> --
> >> Ari Rabkin asrabkin@gmail.com
> >> UC Berkeley Computer Science Department
> >
> >
>
>
>
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
>

Mime
View raw message