hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gera Shegalov <g...@apache.org>
Subject Comparing CheckSum of Local and HDFS File
Date Sat, 08 Aug 2015 02:30:33 GMT
The fs checksum output has more info like bytes per CRC, CRC per block. See
e.g.:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java

In order to avoid dealing with different formatting or byte order you could
use md5sum for the remote file as well if the file is reasonably small

hadoop fs -cat /abc.txt | md5sum

On Fri, Aug 7, 2015 at 3:35 AM Shashi Vishwakarma <shashi.vish123@gmail.com
<javascript:_e(%7B%7D,'cvml','shashi.vish123@gmail.com');>> wrote:

> Hi
>
> I have a small confusion regarding checksum verification.Lets say , i have
> a file abc.txt and I transferred this file to hdfs. How do I ensure about
> data integrity?
>
> I followed below steps to check that file is correctly transferred.
>
> *On Local File System:*
>
> md5sum abc.txt
>
> 276fb620d097728ba1983928935d6121  TestFile
>
> *On Hadoop Cluster :*
>
>  hadoop fs -checksum /abc.txt
>
> /abc.txt      MD5-of-0MD5-of-512CRC32C
>  000002000000000000000000911156a9cf0d906c56db7c8141320df0
>
> Both output looks different to me. Let me know if I am doing anything
> wrong.
>
> How do I verify if my file is transferred properly into HDFS?
>
> Thanks
> Shashi
>

Mime
View raw message