chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Rabkin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CHUKWA-369) proposed reliability mechanism
Date Tue, 04 Aug 2009 19:10:14 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739101#action_12739101
] 

Ari Rabkin commented on CHUKWA-369:
-----------------------------------

One additional refinement:  There are two broad classes of adaptors; those that can reliably
recover data, and those that can't.  File tailers can, exec adaptors, socket listeners, etc,
can't.

Proposal is that adaptors that can't resume after a crash should explicitly update the checkpoint
state when they send data.  This way, after a crash, we'll get explicit gaps in the stream
for those adaptors, and it'll be obvious to downstream listeners what happened, and there
won't be ambiguity about where the data went.

This last change should maybe be its own JIRA -- it's a pretty compact fix. I think only a
single line.

> proposed reliability mechanism
> ------------------------------
>
>                 Key: CHUKWA-369
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-369
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>             Fix For: 0.3.0
>
>
> We like to say that Chukwa is a system for reliable log collection. It isn't, quite,
since we don't handle collector crashes.  Here's a proposed reliability mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message