hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris K Wensel (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-930) Add support for reading regular (non-block-based) files from S3 in S3FileSystem
Date Tue, 03 Jun 2008 20:44:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602067#action_12602067

Chris K Wensel commented on HADOOP-930:

{quote}It's to do with efficiency of listing directories. If you use mime type then you can't
tell the difference between files and directories when listing bucket keys. So you have to
query each key in a directory which can be prohibitively slow. But if you use the _$folder$
suffix convention (which S3Fox uses too BTW) you can easily distinguish files and directories.{quote}

>From what I can tell, s3service.listObjects returns an array of S3Object, where each instance
already has any associated meta-data in a HashMap. Content-Type being one of them. So I think
the penalty has been paid.

Here is the jets3t code.

are you seeing a different behavior or disabling meta-data in jets3t for performance reasons?
Sorry if i seem little rusty on my jets3t api..

{quote}The code should be doing this. I agree that it's useful - in fact, the other s3 filesystem
needs updating to do this too.{quote}

Sorry, didn't see where the checksum was being validated on a read. I see it in NativeS3FsOutputStream
but not NativeS3FsInputStream. Does Jets3t do this automatically? If so cool.

{quote}Have you done this elsewhere?{quote}

I believe those are the only two values that can be munged due to a underscore in the authority.

> Add support for reading regular (non-block-based) files from S3 in S3FileSystem
> -------------------------------------------------------------------------------
>                 Key: HADOOP-930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-930
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.18.0
>         Attachments: hadoop-930-v2.patch, hadoop-930-v3.patch, hadoop-930-v4.patch, hadoop-930.patch,
> People often have input data on S3 that they want to use for a Map Reduce job and the
current S3FileSystem implementation cannot read it since it assumes a block-based format.
> We would add the following metadata to files written by S3FileSystem: an indication that
it is block oriented ("S3FileSystem.type=block") and a filesystem version number ("S3FileSystem.version=1.0").
Regular S3 files would not have the type metadata so S3FileSystem would not try to interpret
them as inodes.
> An extension to write regular files to S3 would not be covered by this change - we could
do this as a separate piece of work (we still need to decide whether to introduce another
scheme - e.g. rename block-based S3 to "s3fs" and call regular S3 "s3" - or whether to just
use a configuration property to control block-based vs. regular writes).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message