chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <asrab...@gmail.com>
Subject Re: duplicate data
Date Thu, 18 Mar 2010 17:24:24 GMT
Howdy,

Chukwa does duplicate detection as follows: Each Chunk of data comes
with a stream name (such as the name of a log file) and a sequence ID.
If two chunks have the same name and ID, they're duplicate.  The
content isn't inspected.

So in your example, the former will be treated as a duplicate, not the latter.

--Ari

On Thu, Mar 18, 2010 at 8:59 AM, Corbin Hoenes <corbin@tynt.com> wrote:
> Does anyone have more information about how chukwa removes duplicates during demux? How
does it decide what is a duplicate?  There are two cases I am thinking of...
>
> 1 - we send the same log file to chukwa 2x
> 2 - we have the exact same line in a log file 2x



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Mime
View raw message