hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laurent Goujon (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-5798) DFSClient uses non-valid data when computing file checksum
Date Sat, 18 Jan 2014 05:52:19 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Laurent Goujon resolved HDFS-5798.
----------------------------------

    Resolution: Duplicate

> DFSClient uses non-valid data when computing file checksum
> ----------------------------------------------------------
>
>                 Key: HDFS-5798
>                 URL: https://issues.apache.org/jira/browse/HDFS-5798
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 1.1.2, 2.0.5-alpha
>            Reporter: Laurent Goujon
>
> In DFSClient.java, when computing the checksum, all md5 checksums are fetched for each
block and added to a DataOutputStream instance (md5out), and later final checksum is computed
this way:
> {code:title=DFSClient.java}
> final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
> {code}
> The problem is that getData() return you a buffer valid until md5out.getLength(), and
fileMD5 is the MD5 of the MD5 of each block PLUS a bunch of random values (here, buffer is
not reused so it should be 0) which depends on the Java implementation of the ByteArrayOutputStream.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message