chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerome Boulon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CHUKWA-369) proposed reliability mechanism
Date Wed, 05 Aug 2009 17:19:14 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739626#action_12739626
] 

Jerome Boulon commented on CHUKWA-369:
--------------------------------------

My 2 cts, The LocalWriter is checking for disk space and will bail out if the disk space reach
a quota:

if (freeSpace < minFreeAvailable) {
      log.fatal("No space left on device, Bail out!");
      DaemonWatcher.bailout(-1);
    } 
 
So the localWriter should do the job for now. 
Regarding HDFS, there's work in progress to be able to use HDFS to write redo logs so when
this will be available, we will take advantage of that.
For people that want to still use the HDFS writer, they can change some HDFS params to reduce
the time before the hdfs client detect a problem. 
Facebook is also using 2 overlapping HDFS cluster to get HA from scribe point of view. The
missing part from our side is a secondary writer but could easily be implemented.



> proposed reliability mechanism
> ------------------------------
>
>                 Key: CHUKWA-369
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-369
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>             Fix For: 0.3.0
>
>
> We like to say that Chukwa is a system for reliable log collection. It isn't, quite,
since we don't handle collector crashes.  Here's a proposed reliability mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message