Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 49385 invoked from network); 11 Jun 2009 23:49:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Jun 2009 23:49:19 -0000 Received: (qmail 43897 invoked by uid 500); 11 Jun 2009 23:49:30 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 43831 invoked by uid 500); 11 Jun 2009 23:49:30 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 43815 invoked by uid 99); 11 Jun 2009 23:49:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jun 2009 23:49:30 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jun 2009 23:49:27 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8B6F1234C046 for ; Thu, 11 Jun 2009 16:49:07 -0700 (PDT) Message-ID: <258452725.1244764147570.JavaMail.jira@brutus> Date: Thu, 11 Jun 2009 16:49:07 -0700 (PDT) From: "Aaron Kimball (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Created: (HADOOP-6016) distcp -pugp does not work when copying to a local file system MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org distcp -pugp does not work when copying to a local file system -------------------------------------------------------------- Key: HADOOP-6016 URL: https://issues.apache.org/jira/browse/HADOOP-6016 Project: Hadoop Core Issue Type: Bug Components: tools/distcp Affects Versions: 0.18.3 Reporter: Aaron Kimball To achieve rsync-like behavior between a local directory and an HDFS instance, a pseudo-distributed MapReduce cluster was started, connected to a fully distributed HDFS instance. An initial distcp from HDFS down to the local fileystem succeeded. The following day, another distcp was run with: $ bin/hadoop distcp -pugp -update hdfs://nn:7276/data/raw file:///data/raw It failed; its output is below: 09/06/07 13:14:51 INFO tools.DistCp: srcPaths=[hdfs://nn:7276/data/raw] 09/06/07 13:14:51 INFO tools.DistCp: destPath=file:/data/raw 09/06/07 13:14:55 INFO tools.DistCp: srcCount=10955 09/06/07 13:14:56 INFO mapred.JobClient: Running job: job_200906071310_0001 09/06/07 13:14:57 INFO mapred.JobClient: map 0% reduce 0% 09/06/07 13:15:24 INFO mapred.JobClient: map 1% reduce 0% 09/06/07 13:17:34 INFO mapred.JobClient: map 2% reduce 0% 09/06/07 13:20:04 INFO mapred.JobClient: map 3% reduce 0% 09/06/07 13:20:49 INFO mapred.JobClient: map 4% reduce 0% 09/06/07 13:21:44 INFO mapred.JobClient: map 5% reduce 0% 09/06/07 13:22:33 INFO mapred.JobClient: map 6% reduce 0% 09/06/07 13:25:14 INFO mapred.JobClient: map 7% reduce 0% 09/06/07 13:27:14 INFO mapred.JobClient: map 8% reduce 0% 09/06/07 13:33:34 INFO mapred.JobClient: map 9% reduce 0% 09/06/07 13:37:30 INFO mapred.JobClient: map 10% reduce 0% 09/06/07 13:40:05 INFO mapred.JobClient: map 11% reduce 0% 09/06/07 13:44:55 INFO mapred.JobClient: map 12% reduce 0% 09/06/07 13:48:55 INFO mapred.JobClient: map 13% reduce 0% 09/06/07 13:54:41 INFO mapred.JobClient: map 14% reduce 0% 09/06/07 13:58:30 INFO mapred.JobClient: map 15% reduce 0% 09/06/07 14:00:46 INFO mapred.JobClient: map 16% reduce 0% 09/06/07 14:01:36 INFO mapred.JobClient: map 17% reduce 0% 09/06/07 14:04:12 INFO mapred.JobClient: map 13% reduce 0% 09/06/07 14:04:12 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_0, Status : FAILED java.io.IOException: Copied: 0 Skipped: 264 Failed: 39 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210) 09/06/07 14:04:19 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_1, Status : FAILED java.io.FileNotFoundException: File does not exist: hdfs://nn:7276/tmp/hadoop/mapred/system/distcp_m8n2e/_distcp_src_files at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:412) at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:684) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1420) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1415) at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43) at org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:272) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210) (several more tasks fail for the same reason with FileNotFoundException) With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:619) at org.apache.hadoop.tools.DistCp.run(DistCp.java:768) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:788) This distcp update operation does succeed without -pugp. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.