hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gera Shegalov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12326) Implement ChecksumFileSystem#getFileChecksum equivalent to HDFS for easy check
Date Fri, 21 Aug 2015 02:43:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706151#comment-14706151
] 

Gera Shegalov commented on HADOOP-12326:
----------------------------------------

{noformat}
$ hadoop fs -Dfs.local.block.size=134217728 -checksum file:${PWD}/local-tgen-1G.txt tgen-1G.txt
 
file:///Users/gshegalov/workspace/hadoop-common/local-tgen-1G.txt	MD5-of-262144MD5-of-512CRC32
0000020000000000000400002df57edb203ab0e106f50823215f7ad8
tgen-1G.txt	MD5-of-262144MD5-of-512CRC32C	000002000000000000040000cbff7790a91bb99e17fbd9ff561222d2

# delete the old checksum file to have it regenerated using the new algorithm
$ rm .local-tgen-1G.txt.crc

# recheck checksums, they should match now
$ hadoop fs -Dfs.local.block.size=134217728 -checksum file:${PWD}/local-tgen-1G.txt tgen-1G.txt
 
file:///Users/gshegalov/workspace/hadoop-common/local-tgen-1G.txt	MD5-of-262144MD5-of-512CRC32C
000002000000000000040000cbff7790a91bb99e17fbd9ff561222d2
tgen-1G.txt	MD5-of-262144MD5-of-512CRC32C	000002000000000000040000cbff7790a91bb99e17fbd9ff561222d2
{noformat}

> Implement ChecksumFileSystem#getFileChecksum equivalent to HDFS for easy check
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-12326
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12326
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.7.1
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: HADOOP-12326.001.patch, HADOOP-12326.002.patch, HADOOP-12326.003.patch,
HADOOP-12326.004.patch, HADOOP-12326.005.patch
>
>
> If we have same-content files, one local and one remotely on HDFS (after downloading
or uploading), getFileChecksum can provide a quick check whether they are consistent.  To
this end, we can switch to CRC32C on local filesystem. The difference in block sizes does
not matter, because for the local filesystem it's just a logical parameter.
> {code}
> $ hadoop fs -Dfs.local.block.size=134217728 -checksum file:${PWD}/part-m-00000 part-m-00000
> 15/08/15 13:30:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> file:///Users/gshegalov/workspace/hadoop-common/part-m-00000	MD5-of-262144MD5-of-512CRC32C
000002000000000000040000e84fb07f8c9d4ef3acb5d1983a7e2a68
> part-m-00000	MD5-of-262144MD5-of-512CRC32C	000002000000000000040000e84fb07f8c9d4ef3acb5d1983a7e2a68
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message