hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White" <tom.e.wh...@gmail.com>
Subject Re: s3
Date Tue, 09 Jan 2007 09:29:30 GMT
> S3 has a lot of somewhat weird limits right now, which make some of this
> tricky for the common case. Files can only be stored as a single s3 object
> if they are less than 5gb, and not 2gb-4gb in size, for instance.

Strange - is this a bug Amazon are fixing do you know?

> In any case, I'd vote for not segmenting these cases, and using something
> like the metadata on the uploaded object to tell between "its a full object"
> and "its an inode, listing blocks".

I think it would be relatively easy to add support for reading regular
S3 files in the current implementation. Doing this with S3 metadata
rather than file "magic" seems right to me. There are two things we
would put in the metadata of files written by S3FileSystem: an
indication that it is block oriented ("S3FileSystem.type=block") and a
filesystem version number ("S3FileSystem.version=1.0"). Regular S3
files would not have the type metadata so S3FileSystem would not try
to interpret them as inodes.

The choice of which type (regular or block) to use when writing a file
could be controlled by a configuration setting and/or the type of the
parent directory (or the scheme "s3"/"s3fs").

Doug's HttpFileSystem would be separate to all this.

> Another thing that would be handy would
> be naming the blocks as a variant on the inode name, so that it's possible
> to "clean up" from erroneous conditions without having to read the full list
> of files, and so that there's an implicit link between an inode's filename
> and the blocks that it stored.

I'm reluctant to name blocks as a variant of the file name, unless we
want to not support renames. I think a fsck tool would meet your
requirement to clean up from erroneous conditions.


View raw message