incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <asrab...@gmail.com>
Subject Re: Chunk sequence IDs other than byte counts?
Date Thu, 11 Mar 2010 02:09:04 GMT
Howdy!

The agent and collector code assumes that IDs go up monotonically.  So
data from your adaptor should get to HDFS correctly.

The archiver, if unmodified, will make a mess of de-duplication, since
it relies on chunk IDs being byte offsets in order to detect
overlapping chunks.

--Ari

On Wed, Mar 10, 2010 at 5:43 PM, Ellen Strnod <estrnod@annealsoft.com> wrote:
> I am a new user, contemplating using Chukwa with data which will be read
> from a JMS queue.  I expect to write an adapter which will read from the
> queue and create text records which will be chunked and sent to the
> collector.  My question - does anyone know if the chunk sequence ID, which
> the Chukwa architecture document says is the number of bytes the adapter has
> sent, could be any other repeatable sequential number, or does it have to be
> a byte count?  (To return to this number is a little problematic in case of
> a restart,  but the records coming off the queue have id's which I would
> like to use.)
>
> I was also looking for a streaming adapter implementation and ran across
> this in Jira: http://issues.apache.org/jira/browse/CHUKWA-102 - it seems
> that this adapter may have the same problem (maybe even more so, since it is
> intended to read from a stream rather than a queue) so maybe someone in the
> project has already given this some thought.
>
> Thanks in advance,
> Ellen
>
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Mime
View raw message