incubator-flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Lorenz-Alten (Reopened) (JIRA)" <>
Subject [jira] [Reopened] (FLUME-629) DFO failure, stops buffering to disk, messages lost
Date Wed, 07 Mar 2012 18:48:57 GMT


Alexander Lorenz-Alten reopened FLUME-629:

> DFO failure, stops buffering to disk, messages lost
> ---------------------------------------------------
>                 Key: FLUME-629
>                 URL:
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.3
>            Reporter: Disabled imported user
>            Priority: Critical
> Single master
> agent: syslogTcp | agentE2EChain
> collector: collectorSource | collectorSink("hdfs://...")
> From reading through various logs, this is, I believe, the order of events:
> - NameNode crashed
> - This caused collector to fail writes to hdfs
> - Which in turn caused agents to start backing up and buffering on disk (correct so far)
> - WatchDog caught a crash and restarted the Flue Master
> - Eventually the DFO stops writing to disk but keeps trying to pass messages
> - ACKs continue to fail and eventually nothing is passed
> Disk space was fine throughout. We had another agent node which continued to operate
normally during this period and buffered all messages as expected. Here's a snip of some of
the relevant sections of log files:
> I can provide the full log files if they will be of use. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message