Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 69881 invoked from network); 16 Aug 2010 21:48:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Aug 2010 21:48:39 -0000 Received: (qmail 42949 invoked by uid 500); 16 Aug 2010 21:48:39 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 42862 invoked by uid 500); 16 Aug 2010 21:48:39 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 42853 invoked by uid 99); 16 Aug 2010 21:48:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Aug 2010 21:48:39 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Aug 2010 21:48:38 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o7GLmHwG021673 for ; Mon, 16 Aug 2010 21:48:18 GMT Message-ID: <2527110.377551281995297784.JavaMail.jira@thor> Date: Mon, 16 Aug 2010 17:48:17 -0400 (EDT) From: "Kris Jirapinyo (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-31) Hadoop distcp tool fails if file path contains special characters + & ! MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899111#action_12899111 ] Kris Jirapinyo commented on HDFS-31: ------------------------------------ Yes, that would be nice. I was using hftp to copy from a 0.20.1 cluster to CDH3 cluster (starting distcp on CDH3 cluster), and I ran into the same 500 error. It seems that the url escaping mechanism is making the final fetch url incorrect. e.g. file in HDFS: /test/twitteruserout2/_logs/history/mi-prod-app01.ec2.biz360.com_1269013964063_job_201003190852_17784_hadoop_twitter+users+extraction+from+source+on+Tue+Apr+20 fetch filename: /test/twitteruserout2/_logs/history/mi-prod-app01.ec2.biz360.com_1269013964063_job_201003190852_17784_hadoop_twitter users extraction from source on Tue Apr 20 Error from specific machine: 2010-08-16 14:33:06,765 WARN org.mortbay.log: /streamFile: java.io.IOException: Cannot open filename /test/twitteruserout2/_logs/history/mi-prod-app01.ec2.biz360.com_1269013964063_job_201003190852_17784_hadoop_twitter users extraction from source on Tue Apr 20 Trying to run from http: http://mi-prod-app28:50075/streamFile?filename=/test/twitteruserout2/_logs/history/mi-prod-app01.ec2.biz360.com_1269013964063_job_201003190852_17784_hadoop_twitter+users+extraction+from+source+on+Tue+Apr+20&ugi=hadoop,hadoop Doesn't work and will give same error as above. However, if I replace the + with %2B then the get works. > Hadoop distcp tool fails if file path contains special characters + & ! > ----------------------------------------------------------------------- > > Key: HDFS-31 > URL: https://issues.apache.org/jira/browse/HDFS-31 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools > Affects Versions: 0.20.2, 0.21.0, 0.22.0 > Reporter: Viraj Bhat > Fix For: 0.22.0 > > > Copying folders containing + & ! characters between hdfs (using hftp) does not work in distcp > For example: > Copying folder "string1+string2" at "namenode.address.com", hftp port myport to "/myotherhome/folder" on "myothermachine" does not work > myothermachine prompt>>> hadoop --config ~/mycluster/ distcp "hftp://namenode.address.com:myport/myhome/dir/string1+string2" /myotherhome/folder/ > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Error results for hadoop job1: > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > 08/07/16 00:27:39 INFO tools.DistCp: srcPaths=[hftp://namenode.address.com:myport/myhome/dir/string1+string2] > 08/07/16 00:27:39 INFO tools.DistCp: destPath=/myotherhome/folder/ > 08/07/16 00:27:41 INFO tools.DistCp: srcCount=2 > 08/07/16 00:27:42 INFO mapred.JobClient: Running job: job1 > 08/07/16 00:27:43 INFO mapred.JobClient: map 0% reduce 0% > 08/07/16 00:27:58 INFO mapred.JobClient: Task Id : attempt_1_m_000000_0, Status : FAILED > java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 > at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:538) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:226) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2208) > 08/07/16 00:28:14 INFO mapred.JobClient: Task Id : attempt_1_m_000000_1, Status : FAILED > java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 > at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:538) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:226) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2208) > 08/07/16 00:28:28 INFO mapred.JobClient: Task Id : attempt_1_m_000000_2, Status : FAILED > java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 > at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:538) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:226) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2208) > With failures, global counters are inaccurate; consider running with -i > Copy failed: java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1053) > at org.apache.hadoop.tools.DistCp.copy(DistCp.java:615) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:764) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:784) > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Error log for the map task which failed > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > INFO org.apache.hadoop.tools.DistCp: FAIL string1+string2/myjobtrackermachine.com-joblog.tar.gz : java.io.IOException: Server returned HTTP response code: 500 for URL: http://mymachine.com:myport/streamFile?filename=/myhome/dir/string1+string2/myjobtrackermachine.com-joblog.tar.gz&ugi=myid,mygroup > at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241) > at org.apache.hadoop.dfs.HftpFileSystem.open(HftpFileSystem.java:117) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:371) > at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:377) > at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:504) > at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:279) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:226) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2208) > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.