Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 5206 invoked from network); 3 May 2007 11:26:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 May 2007 11:26:38 -0000 Received: (qmail 82685 invoked by uid 500); 3 May 2007 11:26:42 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 82663 invoked by uid 500); 3 May 2007 11:26:42 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 82654 invoked by uid 99); 3 May 2007 11:26:42 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 May 2007 04:26:42 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 May 2007 04:26:35 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AA296714065 for ; Thu, 3 May 2007 04:26:15 -0700 (PDT) Message-ID: <3234470.1178191575694.JavaMail.jira@brutus> Date: Thu, 3 May 2007 04:26:15 -0700 (PDT) From: "Hadoop QA (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1061) S3 listSubPaths bug In-Reply-To: <6016637.1173044870725.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493344 ] Hadoop QA commented on HADOOP-1061: ----------------------------------- Integrated in Hadoop-Nightly #77 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/77/) > S3 listSubPaths bug > ------------------- > > Key: HADOOP-1061 > URL: https://issues.apache.org/jira/browse/HADOOP-1061 > Project: Hadoop > Issue Type: Bug > Components: fs > Affects Versions: 0.11.2, 0.12.0 > Reporter: Mike Smith > Priority: Critical > Fix For: 0.13.0 > > Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, hadoop-1061-v3.patch, HADOOP-1061-v4.patch > > > I had problem with the -ls command in s3 file system. It was returning inconsistence number of "Found Items" if you rerun it different times and more importantly it returns recursive results (depth 1) for some folders. > I looked into the code, the problem is caused by jets3t library. The inconsistency problem will be solved if we use : > S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER); > instead of > S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 0); > in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will let GET REST request to have a "max-key" paramter with default value of 1000! It seems s3 GET request is sensetive to this paramater! > But, the recursive problem is because the GET request doesn't execute the delimiter constraint correctly. The response contains all the keys with the given prefix but they don't stop at the path_delimiter. You can simply test this by making couple folder on hadoop s3 filesystem and run -ls. I followed the generated GET request and it looks all fine but it is not executed correctly at the s3 server side.I still don't know why the response doesn't stop at the path_delimiter. > Possible casue: Jets3t library does URL encoding, why do we need to do URL encoding in Jets3tFileSystemStore class!? > example: > Original path is /user/root/folder and it will be encoded to %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will reencode this to make the REST request. And it will be rewritten as %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better to skip the encoding part on Hadoop. This strange structure might be the reason that the s3 doesn't stop at the path_delimiter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.