Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D190A19E5C for ; Wed, 27 Apr 2016 16:26:42 +0000 (UTC) Received: (qmail 58604 invoked by uid 500); 27 Apr 2016 16:26:38 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 58491 invoked by uid 500); 27 Apr 2016 16:26:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 58471 invoked by uid 99); 27 Apr 2016 16:26:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Apr 2016 16:26:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 161941A1284; Wed, 27 Apr 2016 16:26:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.999 X-Spam-Level: X-Spam-Status: No, score=0.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id CDkYAPmV-pHk; Wed, 27 Apr 2016 16:26:35 +0000 (UTC) Received: from oss.nttdata.com (oss.nttdata.co.jp [49.212.34.109]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 682E25F366; Wed, 27 Apr 2016 16:26:34 +0000 (UTC) Received: from macaa.local (p79032f29.tokynt01.ap.so-net.ne.jp [121.3.47.41]) by oss.nttdata.com (Postfix) with ESMTP id A9E8017EE3A; Thu, 28 Apr 2016 01:26:26 +0900 (JST) Subject: Re: DistCp CRC failure modes References: From: Akira AJISAKA To: user@hadoop.apache.org, "hdfs-dev@hadoop.apache.org" Message-ID: <5720E832.3030609@oss.nttdata.co.jp> Date: Thu, 28 Apr 2016 01:26:26 +0900 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.99.1 at oss.nttdata.co.jp X-Virus-Status: Clean (Added hdfs-dev ML) Thanks Elliot for reporting this issue. I'm thinking this is not by design, so we should fix it. Would you file a JIRA for this issue? https://issues.apache.org/jira/browse/HDFS/ If you don't have time to do so, I'll file it on behalf of you. Regards, Akira On 4/27/16 22:43, Elliot West wrote: > Hello, > > We are using DistCp V2 to replicate data between two HDFS file systems. > We were working on the assumption that we could rely on CRC checks to > ensure that the data was replicated correctly. However, after examining > the DistCp source code it seems that there are edge cases where the CRCs > could differ and yet the copy succeeds even when we are not skipping CRC > checks. > > I'm wondering whether this is by design and if so, the reasoning behind > it? If this is a bug, I'd like to raise an issue to fix it. If it is by > design, I'd like to propose the introduction an option for stricter CRC > checks. > > The code in question is contained in the method: > > org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...) > > which can be seen here: > > https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457 > > > Specifically this code block suggests that if there is a failure when > trying to read the source or target checksum then the method will return > 'true', implying that the check succeeded. In actual fact we just failed > to obtain the checksum and could perform no check. > > try { > sourceChecksum = sourceChecksum != null ? sourceChecksum : sourceFS > .getFileChecksum(source); > targetChecksum = targetFS.getFileChecksum(target); > } catch (IOException e) { > LOG.error("Unable to retrieve checksum for " + source + " or " + > target, e); > } > return (sourceChecksum == null || targetChecksum == null || > sourceChecksum.equals(targetChecksum)); > > Ideally I'd like to be able to configure a check where we require that > both the source and target CRCs are retrieved and compared, and if for > any reason either of the CRCs retrievals fail then an exception is > thrown. I do appreciate that some FileSystems cannot return CRCs but > these could still be handled correctly as they would simply return null > and not throw an exception (I assume). > > I'd appreciate any thoughts on this matter. > > Elliot. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org For additional commands, e-mail: user-help@hadoop.apache.org