hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-574) want FileSystem implementation for Amazon S3
Date Fri, 10 Nov 2006 17:50:39 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-574?page=comments#action_12448811 ] 
Doug Cutting commented on HADOOP-574:

Here're some thoughts I sent Jim about this:

DFS stores files as a sequence of ~100MB blocks.  I think a scheme like this will be useful
for an S3-based FileSystem too.

When creating, each DFS block is first written locally to a temporary file, and, only when
the block is full (or the file is closed) is the block actually written to DFS.  This is instead
of trying to trickle things to the network as they're written, which can run into timeout
issues, etc.  It also means that when a block write fails it can be easily retried.

Very large files (up to a terabyte) should be supported.  Breaking things into blocks should
help here too.  S3 limits an object value to 5GB.  So each file can be represented as a set
of ~100MB S3 object values.  The set can be listed when the file is opened and used to guide
seeks and reads of the data.  The block number can be placed at the end of the name using
a delimiter, so that access to metadata is not required when opening files or listing directories.

> want FileSystem implementation for Amazon S3
> --------------------------------------------
>                 Key: HADOOP-574
>                 URL: http://issues.apache.org/jira/browse/HADOOP-574
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Doug Cutting
> An S3-based Hadoop FileSystem would make a great addition to Hadoop.
> It would facillitate use of Hadoop on Amazon's EC2 computing grid, as discussed here:
> http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00318.html
> This is related to HADOOP-571, which would make Hadoop's FileSystem considerably easier
to extend.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message