[ https://issues.apache.org/jira/browse/HDFS-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996481#comment-12996481 ] Steve Loughran commented on HDFS-1630: -------------------------------------- @Hairong -yes, if every TX is checksummed, that's all that is needed. But I think the remote node receiving any checksum should have the right to report the problem so the namenode can then replay it. I don't think the risk of corruption is that high, but statistics is the enemy here, eventually some NIC with built in TCP support will start playing up and your TXs get corrupted before the packet checksum is generated, and then it's no use to the recipient. If the 2ary/backup node can check on receipt and not replay, problems get found faster, and the faulting hardware more easily located > Checksum fsedits > ---------------- > > Key: HDFS-1630 > URL: https://issues.apache.org/jira/browse/HDFS-1630 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: Hairong Kuang > Assignee: Hairong Kuang > > HDFS-903 calculates a MD5 checksum to a saved image, so that we could verify the integrity of the image at the loading time. > The other half of the story is how to verify fsedits. Similarly we could use the checksum approach. But since a fsedit file is growing constantly, a checksum per file does not work. I am thinking to add a checksum per transaction. Is it doable or too expensive? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira