chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Rabkin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CHUKWA-369) proposed reliability mechanism
Date Wed, 02 Sep 2009 18:18:32 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750580#action_12750580
] 

Ari Rabkin commented on CHUKWA-369:
-----------------------------------

Yes; I'm measuring that.  Chunks only get duplicated when a collector crashes; the number
of duplicate chunks is basically just the amount of data lost in .chukwa files.  So for a
single collector failure, it's (write rate) * (period between rotations).     This means that
the fraction of duplicate data is just  (period between rotations) / (mean time between failures)

So if you assume that collectors crash once a week on average, and that the rotation rate
is every five minutes, then the fraction of duplicate data is 0.05%.  

And my measurements bear this out.

> proposed reliability mechanism
> ------------------------------
>
>                 Key: CHUKWA-369
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-369
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>
>         Attachments: delayedAcks.patch
>
>
> We like to say that Chukwa is a system for reliable log collection. It isn't, quite,
since we don't handle collector crashes.  Here's a proposed reliability mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message