chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Rabkin (JIRA)" <>
Subject [jira] Commented: (CHUKWA-338) duplicate suppression in archiver
Date Tue, 30 Jun 2009 08:34:47 GMT


Ari Rabkin commented on CHUKWA-338:

There's one subtle case to think about.  The sequence ID in a chunk is based on the LAST byte
in the chunk.  So what if you have two different chunks that end at the same place, one longer
than another?

Answer:  Keep track of how much you've already written for that stream, and act accordingly.
 Slightly tricky code, but not monstrous, since the reduce gets chunks in sorted order.

> duplicate suppression in archiver
> ---------------------------------
>                 Key: CHUKWA-338
>                 URL:
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>            Reporter: Ari Rabkin
> Right now, Archiver uses an identity reducer.
> It should be straightforward to write a custom reducer that does duplicate detection
and suppression if we get multiple chunks with the same key.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message