Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 39482 invoked from network); 15 Jun 2007 18:23:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jun 2007 18:23:49 -0000 Received: (qmail 73967 invoked by uid 500); 15 Jun 2007 18:23:51 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 73942 invoked by uid 500); 15 Jun 2007 18:23:51 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 73824 invoked by uid 99); 15 Jun 2007 18:23:51 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2007 11:23:50 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2007 11:23:46 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A468D71418F for ; Fri, 15 Jun 2007 11:23:26 -0700 (PDT) Message-ID: <6694322.1181931806671.JavaMail.jira@brutus> Date: Fri, 15 Jun 2007 11:23:26 -0700 (PDT) From: "Koji Noguchi (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-1491) After successful distcp, couple of checksum error files In-Reply-To: <29776539.1181776346535.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505336 ] Koji Noguchi edited comment on HADOOP-1491 at 6/15/07 11:21 AM: ---------------------------------------------------------------- To confirm Dhruba and Raghu's analysis, I inserted one debug print statement inside DFSClient.newBackupFile to print out the "result" and "src". On one node, two mappers started (almost) at the same time by the distcp. There were definitely clashing on the temporary file names. Attaching the two userlogs. Picked files from the clashing and dfs -get from source and target cluster. ls -l showed -rw-r--r-- 1 knoguchi users 133142 Jun 15 10:46 part-270-source -rw-r--r-- 1 knoguchi users 133848 Jun 15 10:47 part-270-target -rw-r--r-- 1 knoguchi users 133848 Jun 15 10:48 part-277-source -rw-r--r-- 1 knoguchi users 133848 Jun 15 10:47 part-277-target After the copy, part-270 file was corrupted. was: To confirm Dhruba and Raghu's analysis, I inserted one debug print statement inside DFSClient.newBackupFile to print out the "result" and "src". On one node, two mappers started (almost) at the same time by the distcp. There were difinitely clashing on the temporary file names. Attaching the two userlogs. Picked files from the clashing and dfs -get from source and target cluster. ls -l showed -rw-r--r-- 1 knoguchi users 133142 Jun 15 10:46 part-270-source -rw-r--r-- 1 knoguchi users 133848 Jun 15 10:47 part-270-target -rw-r--r-- 1 knoguchi users 133848 Jun 15 10:48 part-277-source -rw-r--r-- 1 knoguchi users 133848 Jun 15 10:47 part-277-target After the copy, part-270 file was corrupted. > After successful distcp, couple of checksum error files > ------------------------------------------------------- > > Key: HADOOP-1491 > URL: https://issues.apache.org/jira/browse/HADOOP-1491 > Project: Hadoop > Issue Type: Bug > Components: util > Affects Versions: 0.12.3 > Reporter: Koji Noguchi > Attachments: mapper1.txt > > > Tried copying 700,000 files with distcp. 8 mappers per node. Single dfs.client.buffer.dir. > Distcp ran on 25 nodes mapreduce. > Couple of tasks failed, but job was successful. > When checked, 12 files were corrupted. (Checksum error) > This is repeatable. > I'll add more information as we find. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.