hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laurent Goujon (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-5798) DFSClient uses non-valid data when computing file checksum
Date Sat, 18 Jan 2014 05:52:19 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Laurent Goujon resolved HDFS-5798.

    Resolution: Duplicate

> DFSClient uses non-valid data when computing file checksum
> ----------------------------------------------------------
>                 Key: HDFS-5798
>                 URL: https://issues.apache.org/jira/browse/HDFS-5798
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 1.1.2, 2.0.5-alpha
>            Reporter: Laurent Goujon
> In DFSClient.java, when computing the checksum, all md5 checksums are fetched for each
block and added to a DataOutputStream instance (md5out), and later final checksum is computed
this way:
> {code:title=DFSClient.java}
> final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
> {code}
> The problem is that getData() return you a buffer valid until md5out.getLength(), and
fileMD5 is the MD5 of the MD5 of each block PLUS a bunch of random values (here, buffer is
not reused so it should be 0) which depends on the Java implementation of the ByteArrayOutputStream.

This message was sent by Atlassian JIRA

View raw message