incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <asrab...@gmail.com>
Subject Re: Data loss on collector side
Date Thu, 28 Oct 2010 16:51:22 GMT
Yes. the Agent will resend. The checkpoint state will not be advanced
until an 200 is received from a collector.

Yes, the demux processing is intended to remove duplicates; if it
doesn't, that's a bug.


On Thu, Oct 28, 2010 at 7:58 AM, Jaydeep Ayachit
<jaydeep_ayachit@persistent.co.in> wrote:
> As per the collector design, the collector accepts multiple chunks and
> writes each chunk to hdfs. If all the chunks are written to hdfs, collector
> sends back 200 status to agent
>
> If hdfs write fails in between, the collector aborts entire processing and
> sends exception. This could mean that the data is partially written to hdfs.
> I have a couple of questions
>
>
>
> 1.       The agent does not receive response 200. Does it resend the same
> data to another collector? How does checkpointing works in this case?
>
> 2.       If the agent sends same data to another collector and it goes to
> hdfs, there is a duplication of some records. Are those duplicates filtered
> when preprocessor runs?
>
>
>
> In summary what data loss happens when hdfs goes down from collector
> perspective?
>
>
>
> Thanks,
>
> Jaydeep
>
>
>
> Jaydeep Ayachit | Persistent Systems Ltd
>
> Cell: +91 9822393963 | Desk: +91 712 3986747
>
>
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Mime
View raw message