hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laurent Goujon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-5798) DFSClient uses non-valid data when computing file checksum
Date Fri, 17 Jan 2014 23:10:25 GMT
Laurent Goujon created HDFS-5798:

             Summary: DFSClient uses non-valid data when computing file checksum
                 Key: HDFS-5798
                 URL: https://issues.apache.org/jira/browse/HDFS-5798
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.0.5-alpha, 1.1.2
            Reporter: Laurent Goujon

In DFSClient.java, when computing the checksum, all md5 checksums are fetched for each block
and added to a DataOutputStream instance (md5out), and later final checksum is computed this

final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());

The problem is that getData() return you a buffer valid until md5out.getLength(), and fileMD5
is the MD5 of the MD5 of each block PLUS a bunch of random values (here, buffer is not reused
so it should be 0) which depends on the Java implementation of the ByteArrayOutputStream.

This message was sent by Atlassian JIRA

View raw message