incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <>
Subject Re: Seeing duplicate entries
Date Sat, 23 Oct 2010 00:21:40 GMT
Eric in chukwa 0.5 is hbase the final store instead of hdfs?  What  
format will the hbase data be in (e.g. A chukwarecord object ?  
Something user configurable? )

Sent from my iPhone

On Oct 22, 2010, at 8:48 AM, Eric Yang <> wrote:

> Hi Matt,
> This is expected in Chukwa archives.  When agent is unable to post to
> the collector, it will retry to post the same data again to another
> collector or retrys with the same collector when no other collector is
> available.  Collector may have data written without proper acknowledge
> back to agent in high load situation.  Chukwa philosophy is to retry
> until receiving acknowledgement.  Duplicated data filter will be
> treated after data has been received.
> The duplication filtering in Chukwa 0.3.0 depends on data loading to
> mysql.  The same primary key will update to the same row to remove
> duplicates.  It is possible to build a duplication detection process
> prior to demux which filter data based on sequence id + data type +
> csource (host), but this hasn't been implemented because primary key
> update method works well for my use case.
> In Chukwa 0.5, we are treating duplication the same as in Chukwa 0.3,
> where it will replace any duplicated row in HBase base on Timestamp +
> HBase row key.
> regards,
> Eric
> On Thu, Oct 21, 2010 at 8:22 PM, Matt Davies <>  
> wrote:
>> Hey everyone,
>> I have a situation where I'm seeing duplicated data downstream  
>> before the demux process. It appears this happens during high  
>> system loads and we are still using the 0.3.0 series.
>> So, we have validated that there is a single, unique entry in our  
>> source file which then shows up a random amount of times before we  
>> see it in demux. So, it appears that there is duplication happening  
>> somewhere between the agent and collector.
>> Has anyone else seen this? Any ideas as to why we are seeing this  
>> during high system loads, but not during lower loads.
>> TIA,
>> Matt

View raw message