incubator-chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerome Boulon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CHUKWA-338) duplicate suppression in archiver
Date Mon, 13 Jul 2009 20:59:14 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730510#action_12730510
] 

Jerome Boulon commented on CHUKWA-338:
--------------------------------------

Ari,
Yes, a secondary sort (grouping comparator) will solve the issue but I'm not sure if all current
adaptors are in line with the concept of virtual offset so that would be the first think to
validate.
Also, if you have more than one value for the same key, you may want to double check that
they actually have the same size/content to make sure it's a real duplicate and not an issue
with the virtual offset, especially after rotation.

Since in my mind, the archiver is a background process, it should not be too bad to allways
check for real duplicates vs false duplicates (same SequenceId but not same content).





> duplicate suppression in archiver
> ---------------------------------
>
>                 Key: CHUKWA-338
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-338
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>
>         Attachments: archiveDupSuppress.patch
>
>
> Right now, Archiver uses an identity reducer.
> It should be straightforward to write a custom reducer that does duplicate detection
and suppression if we get multiple chunks with the same key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message