Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73C4D17262 for ; Fri, 17 Apr 2015 12:27:54 +0000 (UTC) Received: (qmail 68638 invoked by uid 500); 17 Apr 2015 12:27:48 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 68532 invoked by uid 500); 17 Apr 2015 12:27:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 68522 invoked by uid 99); 17 Apr 2015 12:27:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Apr 2015 12:27:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of b@beligum.com designates 209.85.212.181 as permitted sender) Received: from [209.85.212.181] (HELO mail-wi0-f181.google.com) (209.85.212.181) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Apr 2015 12:27:22 +0000 Received: by widdi4 with SMTP id di4so18605353wid.0 for ; Fri, 17 Apr 2015 05:26:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=5soPXq0Fr50E7A6OBwBEMgjf6B4Pf/jv/7Xvxr4h2zY=; b=HVAQltvo/GvTmMOb12jzfy8loZPKrWFBEcfnPyd8HDAnbf9yi8IiiNK7fatxzwn0Bs ls02JKmb3mJIjSI2ICcJeWxkiLlv7ZnX/wGlQiF8BlWUAj0KPfO0b0CIJSSQmWJcDNcr I7acSbnOtUr4oOyO94KBpcA7zsK5ou5PRnHoYOTf/P/YqCxlMiodJjtlCovJekkvyJT8 tAOnw4Cl36YHsK1eI8+lOf8/nC5MCvdofBwDb88WxOa/69R7Mvv2DimkeDHrkjxuKO/n EI75NBMX/KP1P0GZcVSShoKz6VdPsKCxa+ptyXgj3j0kxTypnqdhv2MZA/kMMT3x4yNJ HiCw== X-Gm-Message-State: ALoCoQmpakhtNiz9gzWZKpgSHxV77Zi9+MqDKEjN0JnQBm8gcorYP9nZUDaNAVHsCgB+o0jtg7YA MIME-Version: 1.0 X-Received: by 10.194.86.135 with SMTP id p7mr5513255wjz.89.1429273595747; Fri, 17 Apr 2015 05:26:35 -0700 (PDT) Received: by 10.27.217.130 with HTTP; Fri, 17 Apr 2015 05:26:35 -0700 (PDT) In-Reply-To: References: Date: Fri, 17 Apr 2015 14:26:35 +0200 Message-ID: Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem From: Bram Biesbrouck To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0102e498f638d10513eab078 X-Virus-Checked: Checked by ClamAV on apache.org --089e0102e498f638d10513eab078 Content-Type: text/plain; charset=UTF-8 Hi Chris, Thanks for this reply. I thought something funny was happening. The childNum field is actually very useful (eg for (not) rendering a expansion marker next to a folder in a GUI when it has children), so it's a pity the info is there, but get's "eaten up" by the general interface, only to be re-calculated later on. It would be nice to have the info as an optional field in the FileStatus class (initialized to -1 like it is right now), so we can use it if it's there or just ignore it when not initialized. While I'm ranting, HdfsFileStatus should override from FileStatus because it's 95% the same code anyway. If I read your reply correctly, I assume the fields will be deleted from the webhdfs JSON responses as well in the future? Thanks again for the extensive reply, very useful and appreciated. cheers, b. On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth wrote: > Hello Bram, > > There are a few Apache jiras with background discussion of the > introduction of these fields in WebHDFS. > > https://issues.apache.org/jira/browse/HDFS-4502 > > https://issues.apache.org/jira/browse/HDFS-4772 > > https://issues.apache.org/jira/browse/HDFS-4969 > > The new fields could not be supported in HTTPFS (only WebHDFS), and they > were not intended to be guaranteed in the public REST API. Unfortunately, > the fields were added to the documentation mistakenly in Apache Hadoop > 2.5.0. > > https://issues.apache.org/jira/browse/HDFS-6153 > > We're going to revert that documentation change in Apache Hadoop 2.8.0. > I suggest that your application does not rely on these fields, or at least > includes fallback logic to keep working as best as it can if the fields are > not present. Another way to determine the number of children would be to > make a subsequent LISTSTATUS call on the child path. > > I apologize if this caused any inconvenience, and I hope the information > helps. > > Chris Nauroth > Hortonworks > http://hortonworks.com/ > > > From: Bram Biesbrouck > Reply-To: "user@hadoop.apache.org" > Date: Thursday, April 16, 2015 at 7:58 AM > To: "user@hadoop.apache.org" > Subject: Found weird issue with HttpFS and WebHdfsFileSystem > > Hi all, > > I'm experiencing something strange while developing against the HttpFS > front-end webapp on Hadoop 2.6.0. > > I'm currently digging into WebHdfsFileSystem and HttpFS to understand it > better and understand how the rest api works. I've setup a local single > node Hadoop instance, which I can query successfully with eg. > http://localhost:50070/webhdfs/v1/?op=LISTSTATUS > Returning eg. this FileStatus object: > > { > accessTime: 0, > blockSize: 0, > childrenNum: 0, > fileId: 16386, > group: "supergroup", > length: 0, > modificationTime: 1417964248854, > owner: "hadoop", > pathSuffix: "user", > permission: "755", > replication: 0, > storagePolicy: 0, > type: "DIRECTORY" > } > > Now, when I start HttpFS and ask for the same data over it's interface ( > http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different > reply. Especially, the childrenNum and fileId fields are missing, compared > to the first result (same file or directory): > > { > pathSuffix: "user", > type: "DIRECTORY", > length: 0, > owner: "hadoop", > group: "supergroup", > permission: "755", > accessTime: 0, > modificationTime: 1417964248854, > blockSize: 0, > replication: 0 > } > > Since I need the childrenNum property, I started digging into the code > to see where it's "lost" and found that WebHdfsFileSystem performs a > makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just > before the list of filestatuses is returned. Basically, it converts > HdfsFileStatus objects into FileStatus objects, effectively chopping off > those two properties. > > The sources for HdfsFileStatus clearly state that it's an "Interface > that represents the over the wire information for a file.", so I wonder why > this happens, since the HdfsFileStatus contains all the right properties, > according to the docs at > http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory > > It feels like the FileStatus class hasn't been updated to match the > HdfsFileStatus class, but since they don't share any interfaces or > superclasses I get the feeling it's intentional, but I just can't find or > figure out why. > > Can somebody help or shed some light? > > thanks, > > b. > -- > > Bram Biesbrouck - 0486/118280 - www.beligum.com - the republic of > reinvention > > -- Bram Biesbrouck - 0486/118280 - www.beligum.com - the republic of reinvention --089e0102e498f638d10513eab078 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Chris,

Thanks for this reply. I thou= ght something funny was happening.

The=C2=A0childN= um field is actually very useful (eg for (not) rendering a expansion marker= next to a folder in a GUI when it has children), so it's a pity the in= fo is there, but get's "eaten up" by the general interface, o= nly to be re-calculated later on.
It would be nice to have the in= fo as an optional field in the FileStatus class (initialized to -1 like it = is right now), so we can use it if it's there or just ignore it when no= t initialized. While I'm ranting,=C2=A0HdfsFileStatus should override f= rom FileStatus because it's 95% the same code anyway.

If I read your reply correctly, I assume the fields will be deleted= from the webhdfs JSON responses as well in the future?

Thanks again for the extensive reply, very useful and appreciated.

cheers,

b.



On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth &= lt;cnauroth@h= ortonworks.com> wrote:
Hello Bram,

There are a few Apache jiras with background discussion of the introdu= ction of these fields in WebHDFS.




The new fields could not be supported in HTTPFS (only WebHDFS), and th= ey were not intended to be guaranteed in the public REST API.=C2=A0 Unfortu= nately, the fields were added to the documentation mistakenly in Apache Had= oop 2.5.0.


We're going to revert that documentation change in Apache Hadoop 2= .8.0.=C2=A0 I suggest that your application does not rely on these fields, = or at least includes fallback logic to keep working as best as it can if th= e fields are not present.=C2=A0 Another way to determine the number of children would be to make a subsequent LISTSTATUS = call on the child path.

I apologize if this caused any inconvenience, and I hope the informati= on helps.

Chris Nauroth
Hortonworks


From: Bram Biesbrouck <b@beligum.com>
Reply-To: "user@hadoop.apache.org" &= lt;user@hadoop.= apache.org>
Date: Thursday, April 16, 2015 at 7= :58 AM
To: "user@hadoop.apache.org" <user@hadoop.apache= .org>
Subject: Found weird issue with Htt= pFS and WebHdfsFileSystem

Hi all,

I'm experiencing something strange while developing against the Ht= tpFS front-end webapp on Hadoop 2.6.0.

I'm currently digging into WebHdfsFileSystem and HttpFS to underst= and it better and understand how the rest api works. I've setup a local= single node Hadoop instance, which I can query successfully with eg. http://localhost:50070/webhdfs/v1/?op=3DLISTSTATUS
Returning eg. this FileStatus object:

{
accessTime: 0,
blockSize: 0,
childrenNum: 0,
fileId: 16386,
group: "supergroup",
length: 0,
modificationTime: 1417964248854,
owner: "hadoop",
pathSuffix: "user",
permission: "755",
replication: 0,
storagePolicy: 0,
type: "DIRECTORY"
}

Now, when I start HttpFS and ask for the same data over it's inter= face (http://localhost:14000/webhdfs/v1/?op=3DLISTSTATUS), I get = a different reply. Especially, the childrenNum and fileId fields are missing, compared to the first result (same file or directory):

{
pathSuffix: "user",
type: "DIRECTORY",
length: 0,
owner: "hadoop",
group: "supergroup",
permission: "755",
accessTime: 0,
modificationTime: 1417964248854,
blockSize: 0,
replication: 0
}

Since I need the childrenNum property, I started digging into the code= to see where it's "lost" and found that WebHdfsFileSystem pe= rforms a makeQualified() step (around line 1287 in WebHdfsFileSystem.java),= just before the list of filestatuses is returned. Basically, it converts HdfsFileStatus objects into FileStatus objects, eff= ectively chopping off those two properties.

The sources for HdfsFileStatus clearly state that it's an "In= terface that represents the over the wire information for a file.", so= I wonder why this happens, since the HdfsFileStatus contains all the right= properties, according to the docs at http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDF= S.html#List_a_Directory

It feels like the FileStatus class hasn't been updated to match th= e HdfsFileStatus class, but since they don't share any interfaces or su= perclasses I get the feeling it's intentional, but I just can't fin= d or figure out why.

Can somebody help or shed some light?

thanks,

b.
--

=C2=A0Bram Biesbrouck - 0486/118280 - www.beligum.com -<= span style=3D"font-size:small">=C2=A0=C2=A0the republic of reinvention




--

=C2=A0Bram Biesbrouck - 0486/118280 - www.beligum.com -=C2=A0=C2=A0= the republic of reinvention

--089e0102e498f638d10513eab078--