hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <steve.lough...@gmail.com>
Subject Re: where do side-projects go in trunk now that contrib/ is gone?
Date Wed, 13 Feb 2013 09:44:32 GMT
On 12 February 2013 22:09, Eli Collins <eli@cloudera.com> wrote:

> I agree that the current place isn't a good one, for both the reasons
> you mention on the jira (and because the people maintaining this code
> don't primarily work on Hadoop). IMO the SwiftFS driver should live in
> the swift source tree (as part of open stack).

If they could be persuaded to move beyond .py, it'd be tempting -because
the FileSystem API is nominally stable.

However, one thing I have noticed during this work is how the behaviour of
FileSystem is underspecified -that's not an issue for HDFS, which gets
stressed rigorously during the hdfs and mapred test runs, but it does
matter for the rest.

There's a lot of assumptions "files!=directories", mv / anything fails, and
things that aren't tested (mv self self) returns true if self is file,
false if a directory, what exception to raise if readFully goes past the
end of a file (and the answer is?).

We even make an implicit assumption that file operations are consistent:
you get back what you wrote, which turns out to be an assumption not
guaranteed by any of the blobstores in all circumstances.

HADOOP-9258, HADOOP-9119 tighten the spec a bit, but if you look at what
I've been doing for Swift testing, I've created a set of test suites, one
per operation "ls", "read", "rename", with tests for scale, directory depth
and width on my todo list:


Then I want to extract those into tests that can be applied to all
filesystems (say in o.a.g.fs.contract), with some per-FS metadata file
providing details on what the FS supports (rename, append, case
sensitivity, MAX_PATH, ...), so that we've got better test coverage (&
being Junit4, you can skip tests in-code by throwing
AssumptionViolatedExceptions; these get reported as skips), test coverage
that can be applied to all the filesystems in the hadoop codebase.

It's this expanded test coverage that will be the tightest coupling to

> I'm not -1 on it living in-tree, it's just not my 1st choice. If you
> want to create a top-level directory for 3rd party (read non-local,
> non-hdfs file systems) file systems - go for it. It would be an
> improvement on the current situation (o.a.h.fs.ftp also brings in
> dependencies that most people don't need).  I don't think we need to
> come up with a new top-level "kitchen sink" directory to handle all
> Hadoop extensions, there are a few well-defined extension points that
> can likely be handled independently so logically grouping them
> separately makes sense to me (and perhaps we'll decide some extensions
> are better in-tree and some not).

Makes sense. That I will do in a JIRA

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message