hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: Found weird issue with HttpFS and WebHdfsFileSystem
Date Thu, 16 Apr 2015 22:59:48 GMT
Hello Bram,

There are a few Apache jiras with background discussion of the introduction of these fields
in WebHDFS.

https://issues.apache.org/jira/browse/HDFS-4502

https://issues.apache.org/jira/browse/HDFS-4772

https://issues.apache.org/jira/browse/HDFS-4969

The new fields could not be supported in HTTPFS (only WebHDFS), and they were not intended
to be guaranteed in the public REST API.  Unfortunately, the fields were added to the documentation
mistakenly in Apache Hadoop 2.5.0.

https://issues.apache.org/jira/browse/HDFS-6153

We're going to revert that documentation change in Apache Hadoop 2.8.0.  I suggest that your
application does not rely on these fields, or at least includes fallback logic to keep working
as best as it can if the fields are not present.  Another way to determine the number of children
would be to make a subsequent LISTSTATUS call on the child path.

I apologize if this caused any inconvenience, and I hope the information helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/


From: Bram Biesbrouck <b@beligum.com<mailto:b@beligum.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Thursday, April 16, 2015 at 7:58 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Found weird issue with HttpFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the HttpFS front-end webapp on
Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to understand it better and understand
how the rest api works. I've setup a local single node Hadoop instance, which I can query
successfully with eg. http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's interface (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS),
I get a different reply. Especially, the childrenNum and fileId fields are missing, compared
to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code to see where it's "lost"
and found that WebHdfsFileSystem performs a makeQualified() step (around line 1287 in WebHdfsFileSystem.java),
just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects
into FileStatus objects, effectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "Interface that represents the over
the wire information for a file.", so I wonder why this happens, since the HdfsFileStatus
contains all the right properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match the HdfsFileStatus class,
but since they don't share any interfaces or superclasses I get the feeling it's intentional,
but I just can't find or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

[http://dl.beligum.com/pub/beligum_flag_5dpi.png] Bram Biesbrouck - 0486/118280 - www.beligum.com<http://www.beligum.com>
-  the republic of reinvention

Mime
View raw message