hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-930) Add support for reading regular (non-block-based) files from S3 in S3FileSystem
Date Thu, 05 Jun 2008 09:25:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tom White updated HADOOP-930:
-----------------------------

    Attachment: hadoop-930-v5.patch

New patch that works with trunk.

bq. I think Jets3t does validate MD5 checksums on reads - but I'll double check.

This isn't true, Jets3t doesn't validate MD5 checksums on reads. In fact the stream is sent
straight to the client, so it's not possible in general to validate the MD5 checksum - particularly
when doing seeks, which use range GETs. Contrast this with S3FileSystem which retrieves data
in blocks, so it would be easy to add checksum validate there (I've opened HADOOP-3494 for
this). For this issue, I think we should just have write checksum validation.

I've also created HADOOP-3495 to address supporting underscores in bucket names.

> Add support for reading regular (non-block-based) files from S3 in S3FileSystem
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-930
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.18.0
>
>         Attachments: hadoop-930-v2.patch, hadoop-930-v3.patch, hadoop-930-v4.patch, hadoop-930-v5.patch,
hadoop-930.patch, jets3t-0.6.0.jar
>
>
> People often have input data on S3 that they want to use for a Map Reduce job and the
current S3FileSystem implementation cannot read it since it assumes a block-based format.
> We would add the following metadata to files written by S3FileSystem: an indication that
it is block oriented ("S3FileSystem.type=block") and a filesystem version number ("S3FileSystem.version=1.0").
Regular S3 files would not have the type metadata so S3FileSystem would not try to interpret
them as inodes.
> An extension to write regular files to S3 would not be covered by this change - we could
do this as a separate piece of work (we still need to decide whether to introduce another
scheme - e.g. rename block-based S3 to "s3fs" and call regular S3 "s3" - or whether to just
use a configuration property to control block-based vs. regular writes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message