hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Watt <sw...@redhat.com>
Subject Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Date Fri, 31 May 2013 20:00:57 GMT
Hi Folks

I am grateful for the interest and to get so many responses (interested parties that responded
are on CC).

I like Steve Loughran's idea of having a few G+ hangouts first to get to some consensus on
how to organize the work as well as hear his thoughts about leveraging the Hadoop FileSystem
tests he's already developed for the SWIFT object store. I am also keen to present/discuss
the work we've (Red Hat) done around our perception of the state of the art for filesystem
semantics and their test coverage to validate if the community at least has a shared point
of view, which I think would be a good starting point.

What is the protocol for organizing the logistics and collaborating? I am loathe to flood
common-dev with "does this time work for you?" emails from the interested parties. Do we create
a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another
option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress
until we are able to break initiatives down into more granular JIRA tickets.

After we've had a few G+ hangouts, for those that would like to meet face to face, I have
also made an all day reservation for a meeting room that can hold up to 20 people at our Red
Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit
and a short drive away). We don't have to use the whole day, but it gives us some flexibility
around the availability of interested parties. I was thinking something along the lines of
10am - 3pm. We are happy to cater lunch. 

Steve Watt

----- Original Message -----
From: "Steve Loughran" <stevel@hortonworks.com>
To: common-dev@hadoop.apache.org
Sent: Friday, May 24, 2013 3:47:04 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

On 24 May 2013 00:52, Stephen Watt <swatt@redhat.com> wrote:

> Hi Folks
> Hadoop's pluggable filesystem architecture supports the ability to enable
> an alternate filesystem for use with Hadoop by writing a plugin for it. We
> now have several alternate filesystems that have Hadoop FileSystem plugins
> and because this isn't a very well understood topic, I've been working on a
> page on the project wiki to bring this all together -
> http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
> has been opening up Ambari to support any configured Hadoop FileSystem (as
> opposed to just HDFS) over at
> https://issues.apache.org/jira/browse/AMBARI-1817
> My team (over at Red Hat) have been working on writing a Hadoop FileSystem
> plugin for the glusterfs filesystem and have been finding that some of the
> expected semantics of the operations within the Abstract FileSystem class
> are a little ambiguous. With that said, we've joined Steve Loughran in
> attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
> FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371
> It seems to me that once we had these semantics defined, it would be good
> for consistency of implementation if we could make sure they are well
> understood and properly implemented by the community of folks writing
> Hadoop FileSystem plugins. To that end, we might work to ensure that those
> semantics are tested within an exhaustive test framework that focuses on
> the abstract Hadoop FileSystem layer. Each FileSystem provider could run
> the tests to ensure their plugin implementation and behavior is consistent
> with the expectation. Perhaps a broader extension of
> https://issues.apache.org/jira/browse/HADOOP-9258.
I have a plan for starting those tests, pulling up the Swift ones when they
are checked in. Big tests that do scale, and that verify the assumptions
that MR, HBase &c are where we are weakest. The defacto definition of FS
sematics are the apps, and its them that currently find the problems (e.g

> If folks are interested in these goals, I could host a
> workshop/discussion/hackday in Mountain View to get local people together
> (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
> the semantics discussion and test creation. As a side note, I think this
> could also turn out be quite an effective means of introducing FileSystem
> vendors to the ASF and getting them contributing to these aspects of the
> project.
Can we start with some G+ hangouts to get to know each other and have some
broader participation (myself, the others working on Swift, people who have
done S3 (Tom, some of the amazon folk), etc...), Then when a workshop is
held, it's got some clearer objectives "how do we test this". I would want
the FS semantics to be locked down in some online discussions/JIRA rather
than come back after a night's sleep to discover it had be defined with


View raw message