chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: a more fault-tolerant collector
Date Tue, 12 Oct 2010 16:57:20 GMT
I thought that is what it is currently doing with one twist, the commit and response is async.
 Collector exits if the file system is unavailable for extensive period of time.  If it is
not doing what's described above, then we definitely should fix it.

Regards,
Eric


On 10/11/10 10:49 PM, "Ariel Rabkin" <asrabkin@gmail.com> wrote:

Howdy.

This is an answer to a question Bill asked me recently: can we
redesign the Collector process to behave better if the filesystem is
unavailable?

I think we can do this by backpressure. If the write fails, the
collector should return an error to the agent. And the agent should
treat it like a post failure, and retry.  Thoughts?

--Ar

--
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message