hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Reg HDFS checksum
Date Tue, 12 Apr 2011 15:52:43 GMT
On 12/04/2011 07:06, Josh Patterson wrote:
> If you take a look at:
> https://github.com/jpatanooga/IvoryMonkey/blob/master/src/tv/floe/IvoryMonkey/hadoop/fs/ExternalHDFSChecksumGenerator.java
> you'll see a single process version of what HDFS does under the hood,
> albeit in a highly distributed fashion. Whats going on here is that
> for every 512 bytes a CRC32 is calc'd and saved at each local datanode
> for that block. when the "checksum" is requested, these CRC32's are
> pulled together and MD5 hashed, which is sent to the client process.
> The client process then MD5 hashes all of these hashes together to
> produce a final hash.
> For some context: Our purpose on the openPDC project for this was we
> had some legacy software writing to HDFS through a FTP proxy bridge:
> https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/HdfsBridge/
> Since the openPDC data was ultra critical in that we could not lose
> *any* data, and the team wanted to use a simple FTP client lib to
> write to HDFS (least amount of work for them, standard libs), we
> needed a way to make sure that no corruption occurred during the "hop"
> through the FTP bridge (acted as intermediary to DFSClient, something
> could fail, and the file might be slightly truncated, yet hard to
> detect this). In the FTP bridge we allowed a custom FTP command to
> call the now exposed "hdfs-checksum" command, and the sending agent
> could then compute the hash locally (in the case of the openPDC it was
> done in C#), and make sure the file made it there intact. This system
> has been in production for over a year now storing and maintaining
> smart grid data and has been highly reliable.
> I say all of this to say: After having dug through HDFS's checksumming
> code I am pretty confident that its Good Stuff, although I dont
> proclaim to be a filesystem expert by any means. It may be just some
> simple error or oversight in your process, possibly?

Assuming it came down over HTTP, it's perfectly conceivable that 
something went wrong on the way, especially if a proxy server get 
involved. All HTTP checks is that the (optional) content length is 
consistent with what arrived -it relies on TCP checksums, which verify 
the network links work, but not the other bits of the system in the way 
(like any proxy server)

View raw message