hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: FileSystem and FileContext Janitor, at your service !
Date Thu, 06 Mar 2014 11:02:56 GMT
On 5 March 2014 19:07, Jay Vyas <jayunit100@gmail.com> wrote:

> Hi HCFS Community :)
> This is Jay...  Some of you know me.... I hack on a broad range of file
> system and hadoop ecosystem interoperability stuff.  I just wanted to
> introduce myself and let you folks know im going to be working to help
> clean up the existing unit testing frameworks for the FileSystem and
> FileContext APIs.  I've listed some bullets below .
> - byte code inspection based code coverage for file system APIs with a tool
> such as corbertura.
> - HADOOP-9361 points out that there are many different types of file
> systems.
It adds a lot more structure to the tests with an XML declaration of each
FS (in the -test) JAR.

It's pretty much complete except for some discrepancies between file:// and
hdfs that I need to fix in file:
-handling of mkdirs if the destination exists and is a file (currently:
returns 0)
-seek() on a closed stream. Currently appears to work,  at least on OS/X.

> - Creating mock file systems which can be used to validate API tests, which
> emulate different FS semantics (atomic directory creation, eventual
> consistency, strict consistency, POSIX compliance, append support, etc...)

That's an interesting thought, adding some inconsistency semantics on top
of an existing FS to emulate blobstore
behaviour. How would you do this? A in-memory RAM FS could do some of this,
but to test YARN it has to be visible across processes.
We'd really need an in-ram simulation of semantics that also offered an RPC
API of some form.

> Is anyone interested in the above issues or have any opinions on how /
> where i should get started?
> Our end goal is to have a more transparent and portable set of test APIs
> for the hadoop file system implementors, across the board : so that we can
> all test our individual implementations confidently.
> So, anywhere i can lend a hand - let me know.  I think this effort will
> require all of us in the file system community to join forces, and it will
> benefit us all immensly in the long run as well.
I should do another '9361 patch, once I get those final quirks in file://
sorted out so that it is consistent with HDFS.
1. HDFS is and continues to be, the definition of the semantics of all
filesystem interfaces.
2. It'd be good if we understood more about what accidental features of the
FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of
0x00 being a valid char in a filename? How do programs fail when blocksize
is too small (try setting it to 1 and see how pig reacts)? How much code
depends on close() being near-instantaneous and never failing? Blobstores
do their write then, and can break both these requirements -which is
something a mock FS could add atop file:

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message