hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gera Shegalov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-12326) Implement ChecksumFileSystem#getFileChecksum equivalent to HDFS for easy check
Date Sat, 15 Aug 2015 20:30:45 GMT
Gera Shegalov created HADOOP-12326:
--------------------------------------

             Summary: Implement ChecksumFileSystem#getFileChecksum equivalent to HDFS for
easy check
                 Key: HADOOP-12326
                 URL: https://issues.apache.org/jira/browse/HADOOP-12326
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
    Affects Versions: 2.7.1
            Reporter: Gera Shegalov
            Assignee: Gera Shegalov


If we have same-content files, one local and one remotely on HDFS (after downloading or uploading),
getFileChecksum can provide a quick check whether they are consistent.  To this end, we can
switch to CRC32C on local filesystem. The difference in block sizes does not matter, because
for the local filesystem it's just a logical parameter.

{code}
$ hadoop fs -Dfs.local.block.size=134217728 -checksum file:${PWD}/part-m-00000 part-m-00000
15/08/15 13:30:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
file:///Users/gshegalov/workspace/hadoop-common/part-m-00000	MD5-of-262144MD5-of-512CRC32C
000002000000000000040000e84fb07f8c9d4ef3acb5d1983a7e2a68
part-m-00000	MD5-of-262144MD5-of-512CRC32C	000002000000000000040000e84fb07f8c9d4ef3acb5d1983a7e2a68
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message