hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Hansen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10574) webhdfs fails with filenames including semicolons
Date Fri, 24 Jun 2016 19:29:26 GMT
Bob Hansen created HDFS-10574:
---------------------------------

             Summary: webhdfs fails with filenames including semicolons
                 Key: HDFS-10574
                 URL: https://issues.apache.org/jira/browse/HDFS-10574
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: webhdfs
    Affects Versions: 2.7.0
            Reporter: Bob Hansen


Via webhdfs or native HDFS, we can create files with semicolons in their names:

{code}
bhansen@::1 /tmp$ hdfs dfs -copyFromLocal /tmp/data "webhdfs://localhost:50070/foo;bar"
bhansen@::1 /tmp$ hadoop fs -ls /
Found 1 items
-rw-r--r--   2 bhansen supergroup          9 2016-06-24 12:20 /foo;bar
{code}

Attempting to fetch the file via webhdfs fails:
{code}
bhansen@::1 /tmp$ curl -L "http://localhost:50070/webhdfs/v1/foo%3Bbar?user.name=bhansen&op=OPEN"
{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /foo\n\tat org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)\n\tat
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)\n\tat
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)\n\tat
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)\n\tat java.security.AccessController.doPrivileged(Native
Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)\n"}}
{code}

It appears (from the attached TCP dump in curl_request.txt) that the namenode's redirect unescapes
the semicolon, and the DataNode's HTTP server is splitting the request at the semicolon, and
failing to find the file "foo".



Interesting side notes:
* In the attached dfs_copyfrom_local_traffic.txt, you can see the copyFromLocal command writing
the data to "foo;bar_COPYING_", which is then redirected and just writes to "foo".  The subsequent
rename attempts to rename "foo;bar_COPYING_" to "foo;bar", but has the same parsing bug so
effectively renames "foo" to "foo;bar".

Here is the full range of special characters that we initially started with that led to the
minimal reproducer above:
{code}
hdfs dfs -copyFromLocal /tmp/data webhdfs://localhost:50070/'~`!@#$%^& ()-_=+|<.>]}",\\\[\{\*\?\;'\''data'
curl -L "http://localhost:50070/webhdfs/v1/%7E%60%21%40%23%24%25%5E%26+%28%29-_%3D%2B%7C%3C.%3E%5D%7D%22%2C%5C%5B%7B*%3F%3B%27data?user.name=bhansen&op=OPEN&offset=0"
{code}

Thanks to [~anatoli.shein] for making a concise reproducer.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message