incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaydeep Ayachit <>
Subject Data loss on collector side
Date Thu, 28 Oct 2010 14:58:39 GMT
As per the collector design, the collector accepts multiple chunks and writes each chunk to
hdfs. If all the chunks are written to hdfs, collector sends back 200 status to agent
If hdfs write fails in between, the collector aborts entire processing and sends exception.
This could mean that the data is partially written to hdfs. I have a couple of questions

1.       The agent does not receive response 200. Does it resend the same data to another
collector? How does checkpointing works in this case?

2.       If the agent sends same data to another collector and it goes to hdfs, there is a
duplication of some records. Are those duplicates filtered when preprocessor runs?

In summary what data loss happens when hdfs goes down from collector perspective?


Jaydeep Ayachit | Persistent Systems Ltd
Cell: +91 9822393963 | Desk: +91 712 3986747

This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

View raw message