chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <>
Subject Re: Seeing duplicate entries
Date Fri, 22 Oct 2010 16:03:51 GMT
On Fri, Oct 22, 2010 at 8:48 AM, Eric Yang <> wrote:
> Hi Matt,

> The duplication filtering in Chukwa 0.3.0 depends on data loading to
> mysql.  The same primary key will update to the same row to remove
> duplicates.  It is possible to build a duplication detection process
> prior to demux which filter data based on sequence id + data type +
> csource (host), but this hasn't been implemented because primary key
> update method works well for my use case.

This isn't quite right. There is support in 0.3 and later versions for
doing de-duplication at the collector, in the manner Eric describes.
It works as a filter in the writer pipeline.

You need the following in your configuration:



See for background


Ari Rabkin
UC Berkeley Computer Science Department

View raw message