Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B2CB318CBE for ; Tue, 18 Aug 2015 05:49:15 +0000 (UTC) Received: (qmail 69147 invoked by uid 500); 18 Aug 2015 05:49:08 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 69033 invoked by uid 500); 18 Aug 2015 05:49:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69023 invoked by uid 99); 18 Aug 2015 05:49:07 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2015 05:49:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 3F38419D212 for ; Tue, 18 Aug 2015 05:49:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.131 X-Spam-Level: *** X-Spam-Status: No, score=3.131 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id PMoTUNpYm_F6 for ; Tue, 18 Aug 2015 05:48:58 +0000 (UTC) Received: from mail-ob0-f170.google.com (mail-ob0-f170.google.com [209.85.214.170]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id A6E4E20F60 for ; Tue, 18 Aug 2015 05:48:57 +0000 (UTC) Received: by obbhe7 with SMTP id he7so131940486obb.0 for ; Mon, 17 Aug 2015 22:48:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YPfvZlSJiEDyx7dkG0zOMIBE4Z1x9Qd3uMir0PlEsJI=; b=e0vW74jUR1CIck2ZjXxSbQRwSggjCIgIdrm3s4J9J644e8kHARJUTAUXYSV4oFEukT 4SdwgWVfYqBpF7HpTDPsl01na84lYRqmd8FrRVeIXl8f+HHzOLPyzK7pp39hLQRAGkU5 2wGhgE/kUE26dAVzpIl4sPETAiwXX4QP3xgsZAHTE3UyzQdD4Me1uYX38MMajQWZLzEF DEnVrx8qW34DQwGC/q8l5VTYwx5GGZ8szVLrgm3r5CxN1udse13ywbL6fjS9H7HyTWMS ADeZRFwZNP+rnto5tNnCSlkKzD8MnhKxEW3EgjQocPC5mY1/VXf39OrXDo9P7T6eaPoL mkbA== MIME-Version: 1.0 X-Received: by 10.60.93.99 with SMTP id ct3mr4613938oeb.56.1439876936985; Mon, 17 Aug 2015 22:48:56 -0700 (PDT) Received: by 10.202.187.85 with HTTP; Mon, 17 Aug 2015 22:48:56 -0700 (PDT) In-Reply-To: References: Date: Tue, 18 Aug 2015 11:18:56 +0530 Message-ID: Subject: Re: Comparing CheckSum of Local and HDFS File From: Shashi Vishwakarma To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b33d9525974c2051d8f7994 --047d7b33d9525974c2051d8f7994 Content-Type: text/plain; charset=UTF-8 Thanks Gera for creating ticket on Jira. I am bit new to this patch system and I could not find any proper command on ticket. Can I have a proper command/documentation which I can use for testing checksum after applying patch on my cluster? Thanks and Regards, Shashi On Sun, Aug 16, 2015 at 2:13 AM, Gera Shegalov wrote: > I filed https://issues.apache.org/jira/browse/HADOOP-12326 to do that, > you can take a look at the patch. Your understanding is correct: md5 of crc > in each block, then md5 of those block md5s. > > On Sun, Aug 9, 2015 at 7:35 AM Shashi Vishwakarma < > shashi.vish123@gmail.com> wrote: > >> Hi Gera, >> >> Thanks for your input. I have fairly large amount of data and if I go by >> -cat option followed by md5sum calculation then it will become time >> consuming process. >> >> I could understand from the code that hadoop checksum is nothing but MD5 >> of MD5 of CRC32C and then returning output.I would be more curious to know >> if in case I have to create checksum manually that hadoop is doing >> internally, then how do I do that? >> >> Is there any document or link available which can explain that how this >> checksum calculation works behind the scene? >> >> Thanks >> Shashi >> >> On Sat, Aug 8, 2015 at 8:00 AM, Gera Shegalov wrote: >> >>> The fs checksum output has more info like bytes per CRC, CRC per block. >>> See e.g.: >>> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java >>> >>> In order to avoid dealing with different formatting or byte order you >>> could use md5sum for the remote file as well if the file is reasonably small >>> >>> hadoop fs -cat /abc.txt | md5sum >>> >>> On Fri, Aug 7, 2015 at 3:35 AM Shashi Vishwakarma < >>> shashi.vish123@gmail.com> wrote: >>> >>>> Hi >>>> >>>> I have a small confusion regarding checksum verification.Lets say , i >>>> have a file abc.txt and I transferred this file to hdfs. How do I ensure >>>> about data integrity? >>>> >>>> I followed below steps to check that file is correctly transferred. >>>> >>>> *On Local File System:* >>>> >>>> md5sum abc.txt >>>> >>>> 276fb620d097728ba1983928935d6121 TestFile >>>> >>>> *On Hadoop Cluster :* >>>> >>>> hadoop fs -checksum /abc.txt >>>> >>>> /abc.txt MD5-of-0MD5-of-512CRC32C >>>> 000002000000000000000000911156a9cf0d906c56db7c8141320df0 >>>> >>>> Both output looks different to me. Let me know if I am doing anything >>>> wrong. >>>> >>>> How do I verify if my file is transferred properly into HDFS? >>>> >>>> Thanks >>>> Shashi >>>> >>> >> --047d7b33d9525974c2051d8f7994 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks Gera for creating ticket on Jira. I am bit new to t= his patch system and I could not find any proper command on ticket. Can I h= ave a proper command/documentation which I can use for testing checksum aft= er applying patch on my cluster?

Thanks and Regards,
Shashi

On Sun, Aug 16, 2015 at 2:13 AM, Gera Shegalov &l= t;gera@shegalov.com<= /a>> wrote:
I filed=C2=A0https://issues.apache.org/= jira/browse/HADOOP-12326 to do that, you can take a look at the patch. = Your understanding is correct: md5 of crc in each block, then md5 of those = block md5s.

On Sun, Aug 9, 2015 at 7:35 AM Shashi Vishwakarma <shashi.vish123@gmail.com<= /a>> wrote:

On Sat, Aug 8, 2015 at 8:00 AM, Gera Shegalov <= gera@apache.org>= ; wrote:
The fs c= hecksum output has more info like bytes per CRC, CRC per block. See e.g.:= =C2=A0https://github.com/apache/hadoop/blob/trunk/h= adoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD= 5CRC32FileChecksum.java

In order to avoid dealing wi= th different formatting or byte order you could use md5sum for the remote f= ile as well if the file is reasonably small

hadoo= p fs -cat /abc.txt |=C2=A0md5sum

On Fri= , Aug 7, 2015 at 3:35 AM Shashi Vishwakarma <shashi.vish123@gmail.com= > wrote:
Hi

=
I have a small confusion regard= ing checksum verification.Lets say , i have a file abc.txt and I transferre= d this file to hdfs. How do I ensure about data integrity?

I followed below steps to check that file is correctly transfe= rred.

On Local File System:

md5sum abc.txt=C2=A0

276fb620d0= 97728ba1983928935d6121 =C2=A0TestFile

On= Hadoop Cluster :
<= br>
=C2=A0hadoop fs -c= hecksum /abc.txt=C2=A0

/abc.txt=C2=A0=C2=A0= =C2=A0 =C2=A0MD5-of-0MD5-of-512CRC32C =C2=A0 =C2=A0 =C2=A0 =C2=A0000002000= 000000000000000911156a9cf0d906c56db7c8141320df0

Both output looks different to me. Let me know if I am doing anything = wrong.

How do I verify if my file is transferre= d properly into HDFS? =C2=A0

Thanks
Shashi


--047d7b33d9525974c2051d8f7994--