Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 63268 invoked from network); 3 Jun 2008 20:05:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jun 2008 20:05:07 -0000 Received: (qmail 19652 invoked by uid 500); 3 Jun 2008 20:05:09 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 19342 invoked by uid 500); 3 Jun 2008 20:05:08 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 19331 invoked by uid 99); 3 Jun 2008 20:05:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 13:05:08 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 20:04:28 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4EEAB234C133 for ; Tue, 3 Jun 2008 13:04:45 -0700 (PDT) Message-ID: <1392781043.1212523485322.JavaMail.jira@brutus> Date: Tue, 3 Jun 2008 13:04:45 -0700 (PDT) From: "Tom White (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-930) Add support for reading regular (non-block-based) files from S3 in S3FileSystem MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602056#action_12602056 ] Tom White commented on HADOOP-930: ---------------------------------- Thanks for the review Chris. bq. Any reason you didn't use the mime type to denote directory files (as jets3t does)? It's to do with efficiency of listing directories. If you use mime type then you can't tell the difference between files and directories when listing bucket keys. So you have to query each key in a directory which can be prohibitively slow. But if you use the _$folder$ suffix convention (which S3Fox uses too BTW) you can easily distinguish files and directories. bq. I believe MD5 checksum should be set on s3 put (via header), and verified on s3 get. The code should be doing this. I agree that it's useful - in fact, the other s3 filesystem needs updating to do this too. bq. Sometimes 'legacy' buckets have underscores, might consider trying to survive them. Thanks for the tip. The code does detect this condition, but it might be nice to try to workaround as you say (perhaps emitting a warning). Have you done this elsewhere? > Add support for reading regular (non-block-based) files from S3 in S3FileSystem > ------------------------------------------------------------------------------- > > Key: HADOOP-930 > URL: https://issues.apache.org/jira/browse/HADOOP-930 > Project: Hadoop Core > Issue Type: New Feature > Components: fs > Affects Versions: 0.10.1 > Reporter: Tom White > Assignee: Tom White > Fix For: 0.18.0 > > Attachments: hadoop-930-v2.patch, hadoop-930-v3.patch, hadoop-930-v4.patch, hadoop-930.patch, jets3t-0.6.0.jar > > > People often have input data on S3 that they want to use for a Map Reduce job and the current S3FileSystem implementation cannot read it since it assumes a block-based format. > We would add the following metadata to files written by S3FileSystem: an indication that it is block oriented ("S3FileSystem.type=block") and a filesystem version number ("S3FileSystem.version=1.0"). Regular S3 files would not have the type metadata so S3FileSystem would not try to interpret them as inodes. > An extension to write regular files to S3 would not be covered by this change - we could do this as a separate piece of work (we still need to decide whether to introduce another scheme - e.g. rename block-based S3 to "s3fs" and call regular S3 "s3" - or whether to just use a configuration property to control block-based vs. regular writes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.