incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <>
Subject Re: Chunk sequence IDs other than byte counts?
Date Thu, 11 Mar 2010 02:09:04 GMT

The agent and collector code assumes that IDs go up monotonically.  So
data from your adaptor should get to HDFS correctly.

The archiver, if unmodified, will make a mess of de-duplication, since
it relies on chunk IDs being byte offsets in order to detect
overlapping chunks.


On Wed, Mar 10, 2010 at 5:43 PM, Ellen Strnod <> wrote:
> I am a new user, contemplating using Chukwa with data which will be read
> from a JMS queue.  I expect to write an adapter which will read from the
> queue and create text records which will be chunked and sent to the
> collector.  My question - does anyone know if the chunk sequence ID, which
> the Chukwa architecture document says is the number of bytes the adapter has
> sent, could be any other repeatable sequential number, or does it have to be
> a byte count?  (To return to this number is a little problematic in case of
> a restart,  but the records coming off the queue have id's which I would
> like to use.)
> I was also looking for a streaming adapter implementation and ran across
> this in Jira: - it seems
> that this adapter may have the same problem (maybe even more so, since it is
> intended to read from a stream rather than a queue) so maybe someone in the
> project has already given this some thought.
> Thanks in advance,
> Ellen

Ari Rabkin
UC Berkeley Computer Science Department

View raw message