hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laurent Goujon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-5798) DFSClient uses non-valid data when computing file checksum
Date Fri, 17 Jan 2014 23:10:25 GMT
Laurent Goujon created HDFS-5798:
------------------------------------

             Summary: DFSClient uses non-valid data when computing file checksum
                 Key: HDFS-5798
                 URL: https://issues.apache.org/jira/browse/HDFS-5798
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.0.5-alpha, 1.1.2
            Reporter: Laurent Goujon


In DFSClient.java, when computing the checksum, all md5 checksums are fetched for each block
and added to a DataOutputStream instance (md5out), and later final checksum is computed this
way:

{code:title=DFSClient.java}
final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
{code}

The problem is that getData() return you a buffer valid until md5out.getLength(), and fileMD5
is the MD5 of the MD5 of each block PLUS a bunch of random values (here, buffer is not reused
so it should be 0) which depends on the Java implementation of the ByteArrayOutputStream.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message