hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srinivasu Majeti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14323) Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path
Date Fri, 01 Mar 2019 05:06:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781294#comment-16781294
] 

Srinivasu Majeti commented on HDFS-14323:
-----------------------------------------

Hi [~jojochuang] and [~zvenczel],

 We would need try webhdfs url with some special character like below.

[hdfs@c1265-node2 root]$ hadoop distcp webhdfs://c2265-node2.hwx.com:50070/tmp/date=1234557
/check

You could see an failure with exception like below.

19/02/21 06:35:59 DEBUG security.UserGroupInformation: PrivilegedActionException as:xxxxxxxx 
(auth:KERBEROS) cause:java.io.FileNotFoundException: File does not exist: /tmp/date%3D1234557

19/02/21 06:35:59 DEBUG ipc.ProtobufRpcEngine: Call: delete took 4ms
19/02/21 06:35:59 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: webhdfs://c2265-node2.hwx.com:50070/tmp/date=1234557
doesn't exist

c1265-node2 -> 3.x [ hadoop 3.2.0 ] client 

c2265-node2.hwx.com -> 2.x cluster NN .

Thanks and Regards,

Majeti.

> Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs
file path
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14323
>                 URL: https://issues.apache.org/jira/browse/HDFS-14323
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 3.2.0
>            Reporter: Srinivasu Majeti
>            Priority: Major
>
> There was an enhancement to allow semicolon in source/target URLs for distcp use case
as part of HDFS-13176 and backward compatibility fix as part of HDFS-13582 . Still there
seems to be an issue when trying to trigger distcp from 3.x cluster to pull webhdfs data from
2.x hadoop cluster. We might need to deal with existing fix as described below by making sure
if url is already encoded or not. That fixes it. 
> diff --git a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
> index 5936603c34a..dc790286aff 100644
> --- a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
> +++ b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
> @@ -609,7 +609,10 @@ URL toUrl(final HttpOpParam.Op op, final Path fspath,
>  boolean pathAlreadyEncoded = false;
>  try {
>  fspathUriDecoded = URLDecoder.decode(fspathUri.getPath(), "UTF-8");
> - pathAlreadyEncoded = true;
> + if(!fspathUri.getPath().equals(fspathUriDecoded))
> + {
> + pathAlreadyEncoded = true;
> + }
>  } catch (IllegalArgumentException ex) {
>  LOG.trace("Cannot decode URL encoded file", ex);
>  }
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message