Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5B14C10EDA for ; Fri, 28 Mar 2014 10:29:32 +0000 (UTC) Received: (qmail 61701 invoked by uid 500); 28 Mar 2014 10:29:24 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 60947 invoked by uid 500); 28 Mar 2014 10:29:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 60928 invoked by uid 99); 28 Mar 2014 10:29:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2014 10:29:20 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of reena2485@outlook.com designates 65.55.90.108 as permitted sender) Received: from [65.55.90.108] (HELO snt0-omc2-s33.snt0.hotmail.com) (65.55.90.108) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2014 10:29:15 +0000 Received: from SNT150-W18 ([65.55.90.72]) by snt0-omc2-s33.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 28 Mar 2014 03:28:51 -0700 X-TMN: [0ondryCT0q8JPSQznHziLn3AaDGsLLvf] X-Originating-Email: [reena2485@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_b9f27464-60a0-4b30-8058-d0c37d775516_" From: reena upadhyay To: "user@hadoop.apache.org" Subject: How check sum are generated for blocks in data node Date: Fri, 28 Mar 2014 15:58:51 +0530 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 28 Mar 2014 10:28:51.0988 (UTC) FILETIME=[837DB540:01CF4A70] X-Virus-Checked: Checked by ClamAV on apache.org --_b9f27464-60a0-4b30-8058-d0c37d775516_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I was going through this link http://stackoverflow.com/questions/9406477/da= ta-integrity-in-hdfs-which-data-nodes-verifies-the-checksum . Its written t= hat in recent version of hadoop only the last data node verifies the checks= um as the write happens in a pipeline fashion.=20 Now I have a question: Assuming my cluster has two data nodes A and B cluster=2C I have a file=2C = half of the file content is written on first data node A and the other rema= ining half is written on the second data node B to take advantage of parall= elism. My question is: Will data node A will not store the check sum for = the blocks stored on it.=20 As per the line "only the last data node verifies the checksum"=2C it looks= like only the last data node in my case it will be data node B=2C will ge= nerate the checksum. But if only data node B generates checksum=2C then it = will generate the check sum only for the blocks stored on data node B. What= about the checksum for the data blocks on data node machine A? = --_b9f27464-60a0-4b30-8058-d0c37d775516_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I was going through this link http://stackoverfl= ow.com/questions/9406477/data-integrity-in-hdfs-which-data-nodes-verifies-t= he-checksum . Its written that in recent version of hadoop only the las= t data node verifies the checksum as the write happens in a pipeline fashio= n.
Now I have a question:
Assuming my cluster has two data nodes A a= nd B cluster=2C I have a file=2C half of the file content is written on fir= st data node A and the other remaining half is written on the= second data node B to take advantage of parallelism. =3B My que= stion is: =3B Will data node A will not store the check sum for = the blocks stored on it.

As per the line "only the last data node v= erifies the checksum"=2C it looks like only the =3B last data node in m= y case it will be data node B=2C will generate the checksum. But if = only data node B generates checksum=2C then it will generate the che= ck sum only for the blocks stored on data node B. What about the che= cksum for the data blocks on data node =3B machine A?
=
= --_b9f27464-60a0-4b30-8058-d0c37d775516_--