hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Vishwakarma <shashi.vish...@gmail.com>
Subject Re: Comparing CheckSum of Local and HDFS File
Date Tue, 18 Aug 2015 05:48:56 GMT
Thanks Gera for creating ticket on Jira. I am bit new to this patch system
and I could not find any proper command on ticket. Can I have a proper
command/documentation which I can use for testing checksum after applying
patch on my cluster?

Thanks and Regards,
Shashi

On Sun, Aug 16, 2015 at 2:13 AM, Gera Shegalov <gera@shegalov.com> wrote:

> I filed https://issues.apache.org/jira/browse/HADOOP-12326 to do that,
> you can take a look at the patch. Your understanding is correct: md5 of crc
> in each block, then md5 of those block md5s.
>
> On Sun, Aug 9, 2015 at 7:35 AM Shashi Vishwakarma <
> shashi.vish123@gmail.com> wrote:
>
>> Hi Gera,
>>
>> Thanks for your input. I have fairly large amount of data and if I go by
>> -cat option followed by md5sum calculation then it will become time
>> consuming process.
>>
>> I could understand from the code that hadoop checksum is nothing but MD5
>> of MD5 of CRC32C and then returning output.I would be more curious to know
>> if in case I  have to create checksum manually that hadoop is doing
>> internally, then how do I do that?
>>
>> Is there any document or link available which can explain that how this
>> checksum calculation works behind the scene?
>>
>> Thanks
>> Shashi
>>
>> On Sat, Aug 8, 2015 at 8:00 AM, Gera Shegalov <gera@apache.org> wrote:
>>
>>> The fs checksum output has more info like bytes per CRC, CRC per block.
>>> See e.g.:
>>> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java
>>>
>>> In order to avoid dealing with different formatting or byte order you
>>> could use md5sum for the remote file as well if the file is reasonably small
>>>
>>> hadoop fs -cat /abc.txt | md5sum
>>>
>>> On Fri, Aug 7, 2015 at 3:35 AM Shashi Vishwakarma <
>>> shashi.vish123@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> I have a small confusion regarding checksum verification.Lets say , i
>>>> have a file abc.txt and I transferred this file to hdfs. How do I ensure
>>>> about data integrity?
>>>>
>>>> I followed below steps to check that file is correctly transferred.
>>>>
>>>> *On Local File System:*
>>>>
>>>> md5sum abc.txt
>>>>
>>>> 276fb620d097728ba1983928935d6121  TestFile
>>>>
>>>> *On Hadoop Cluster :*
>>>>
>>>>  hadoop fs -checksum /abc.txt
>>>>
>>>> /abc.txt      MD5-of-0MD5-of-512CRC32C
>>>>  000002000000000000000000911156a9cf0d906c56db7c8141320df0
>>>>
>>>> Both output looks different to me. Let me know if I am doing anything
>>>> wrong.
>>>>
>>>> How do I verify if my file is transferred properly into HDFS?
>>>>
>>>> Thanks
>>>> Shashi
>>>>
>>>
>>

Mime
View raw message