incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <cor...@tynt.com>
Subject Re: duplicate data
Date Fri, 19 Mar 2010 04:50:15 GMT
So in scenario the stream name should be the same but how do sequence IDs get generated?  If
I tried to tail the same log file 24 hours after doing it the first time would they have the
same seq id?

On Mar 18, 2010, at 11:24 AM, Ariel Rabkin wrote:

> Howdy,
> 
> Chukwa does duplicate detection as follows: Each Chunk of data comes
> with a stream name (such as the name of a log file) and a sequence ID.
> If two chunks have the same name and ID, they're duplicate.  The
> content isn't inspected.
> 
> So in your example, the former will be treated as a duplicate, not the latter.
> 
> --Ari
> 
> On Thu, Mar 18, 2010 at 8:59 AM, Corbin Hoenes <corbin@tynt.com> wrote:
>> Does anyone have more information about how chukwa removes duplicates during demux?
How does it decide what is a duplicate?  There are two cases I am thinking of...
>> 
>> 1 - we send the same log file to chukwa 2x
>> 2 - we have the exact same line in a log file 2x
> 
> 
> 
> -- 
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department


Mime
View raw message