incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <corbinhoe...@gmail.com>
Subject Re: Seeing duplicate entries
Date Sat, 23 Oct 2010 20:27:04 GMT
+1

I imagine it is jst another pipelinable class loaded into the  
collector?  If so bill's scenario would work.

Sent from my iPhone

On Oct 23, 2010, at 12:59 PM, Bill Graham <billgraham@gmail.com> wrote:

> Eric, I'm also curious about how the HBase integration works. Do you
> have time to write something up on it? I'm interested in the
> possibility of extending what's there to write my own custom data into
> HBase from a collector, while said data also continues through to HDFS
> as it does currently.
>
>
> On Fri, Oct 22, 2010 at 5:21 PM, Corbin Hoenes  
> <corbinhoenes@gmail.com> wrote:
>> Eric in chukwa 0.5 is hbase the final store instead of hdfs?  What  
>> format
>> will the hbase data be in (e.g. A chukwarecord object ? Something  
>> user
>> configurable? )
>>
>> Sent from my iPhone
>>
>> On Oct 22, 2010, at 8:48 AM, Eric Yang <eric818@gmail.com> wrote:
>>
>>> Hi Matt,
>>>
>>> This is expected in Chukwa archives.  When agent is unable to post  
>>> to
>>> the collector, it will retry to post the same data again to another
>>> collector or retrys with the same collector when no other  
>>> collector is
>>> available.  Collector may have data written without proper  
>>> acknowledge
>>> back to agent in high load situation.  Chukwa philosophy is to retry
>>> until receiving acknowledgement.  Duplicated data filter will be
>>> treated after data has been received.
>>>
>>> The duplication filtering in Chukwa 0.3.0 depends on data loading to
>>> mysql.  The same primary key will update to the same row to remove
>>> duplicates.  It is possible to build a duplication detection process
>>> prior to demux which filter data based on sequence id + data type +
>>> csource (host), but this hasn't been implemented because primary key
>>> update method works well for my use case.
>>>
>>> In Chukwa 0.5, we are treating duplication the same as in Chukwa  
>>> 0.3,
>>> where it will replace any duplicated row in HBase base on  
>>> Timestamp +
>>> HBase row key.
>>>
>>> regards,
>>> Eric
>>>
>>> On Thu, Oct 21, 2010 at 8:22 PM, Matt Davies  
>>> <matt.davies@tynt.com> wrote:
>>>>
>>>> Hey everyone,
>>>>
>>>> I have a situation where I'm seeing duplicated data downstream  
>>>> before the
>>>> demux process. It appears this happens during high system loads  
>>>> and we are
>>>> still using the 0.3.0 series.
>>>>
>>>> So, we have validated that there is a single, unique entry in our  
>>>> source
>>>> file which then shows up a random amount of times before we see  
>>>> it in demux.
>>>> So, it appears that there is duplication happening somewhere  
>>>> between the
>>>> agent and collector.
>>>>
>>>> Has anyone else seen this? Any ideas as to why we are seeing this  
>>>> during
>>>> high system loads, but not during lower loads.
>>>>
>>>> TIA,
>>>> Matt
>>>>
>>>>
>>

Mime
View raw message