hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ste...@apache.org
Subject svn commit: r1607596 [2/5] - in /hadoop/common/trunk/hadoop-common-project/hadoop-common/src: main/java/org/apache/hadoop/fs/ main/java/org/apache/hadoop/fs/ftp/ main/java/org/apache/hadoop/fs/s3/ main/java/org/apache/hadoop/fs/s3native/ site/markdown/...
Date Thu, 03 Jul 2014 12:04:52 GMT
Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md Thu Jul  3 12:04:50 2014
@@ -0,0 +1,379 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+<!--  ============================================================= -->
+<!--  CLASS: FSDataInputStream -->
+<!--  ============================================================= -->
+
+
+# Class `FSDataInputStream extends DataInputStream`
+
+The core behavior of `FSDataInputStream` is defined by `java.io.DataInputStream`,
+with extensions that add key assumptions to the system.
+
+1. The source is a local or remote filesystem.
+1. The stream being read references a finite array of bytes.
+1. The length of the data does not change during the read process.
+1. The contents of the data does not change during the process.
+1. The source file remains present during the read process
+1. Callers may use `Seekable.seek()` to offsets within the array of bytes, with future
+reads starting at this offset.
+1. The cost of forward and backward seeks is low.
+1. There is no requirement for the stream implementation to be thread-safe.
+ Callers MUST assume that instances are not thread-safe.
+
+
+Files are opened via `FileSystem.open(p)`, which, if successful, returns:
+
+    result = FSDataInputStream(0, FS.Files[p])
+
+The stream can be modeled as:
+
+    FSDIS = (pos, data[], isOpen)
+
+with access functions:
+
+    pos(FSDIS)
+    data(FSDIS)
+    isOpen(FSDIS)
+
+**Implicit invariant**: the size of the data stream equals the size of the
+file as returned by `FileSystem.getFileStatus(Path p)`
+
+    forall p in dom(FS.Files[p]) :
+    len(data(FSDIS)) == FS.getFileStatus(p).length
+
+
+### `Closeable.close()`
+
+The semantics of `java.io.Closeable` are defined in the interface definition
+within the JRE.
+
+The operation MUST be idempotent; the following sequence is not an error:
+
+    FSDIS.close();
+    FSDIS.close();
+
+#### Implementation Notes
+
+* Implementations SHOULD be robust against failure. If an inner stream
+is closed, it should be checked for being `null` first.
+
+* Implementations SHOULD NOT raise `IOException` exceptions (or any other exception)
+during this operation. Client applications often ignore these, or may fail
+unexpectedly.
+
+
+
+
+
+#### Postconditions
+
+
+    FSDIS' = ((undefined), (undefined), False)
+
+
+### `Seekable.getPos()`
+
+Return the current position. The outcome when a stream is closed is undefined.
+
+#### Preconditions
+
+    isOpen(FSDIS)
+
+#### Postconditions
+
+    result = pos(FSDIS)
+
+
+### `InputStream.read()`
+
+Return the data at the current position.
+
+1. Implementations should fail when a stream is closed
+1. There is no limit on how long `read()` may take to complete.
+
+#### Preconditions
+
+    isOpen(FSDIS)
+
+#### Postconditions
+
+    if ( pos < len(data) ):
+       FSDIS' = (pos + 1, data, True)
+       result = data[pos]
+    else
+        result = -1
+
+
+### `InputStream.read(buffer[], offset, length)`
+
+Read `length` bytes of data into the destination buffer, starting at offset
+`offset`
+
+#### Preconditions
+
+    isOpen(FSDIS)
+    buffer != null else raise NullPointerException
+    length >= 0
+    offset < len(buffer)
+    length <= len(buffer) - offset
+
+Exceptions that may be raised on precondition failure are
+
+    InvalidArgumentException
+    ArrayIndexOutOfBoundsException
+    RuntimeException
+
+#### Postconditions
+
+    if length == 0 :
+      result = 0
+
+    elseif pos > len(data):
+      result -1
+
+    else
+      let l = min(length, len(data)-length) :
+          buffer' = buffer where forall i in [0..l-1]:
+              buffer'[o+i] = data[pos+i]
+          FSDIS' = (pos+l, data, true)
+          result = l
+
+### `Seekable.seek(s)`
+
+
+#### Preconditions
+
+Not all subclasses implement the Seek operation:
+
+    supported(FSDIS, Seekable.seek) else raise [UnsupportedOperationException, IOException]
+
+If the operation is supported, the file SHOULD be open:
+
+    isOpen(FSDIS)
+
+Some filesystems do not perform this check, relying on the `read()` contract
+to reject reads on a closed stream (e.g. `RawLocalFileSystem`).
+
+A `seek(0)` MUST always succeed, as  the seek position must be
+positive and less than the length of the Stream's:
+
+    s > 0 and ((s==0) or ((s < len(data)))) else raise [EOFException, IOException]
+
+Some FileSystems do not raise an exception if this condition is not met. They
+instead return -1 on any `read()` operation where, at the time of the read,
+`len(data(FSDIS)) < pos(FSDIS)`.
+
+#### Postconditions
+
+    FSDIS' = (s, data, True)
+
+There is an implicit invariant: a seek to the current position is a no-op
+
+    seek(getPos())
+
+Implementations may recognise this operation and bypass all other precondition
+checks, leaving the input stream unchanged.
+
+
+### `Seekable.seekToNewSource(offset)`
+
+This operation instructs the source to retrieve `data[]` from a different
+source from the current source. This is only relevant if the filesystem supports
+multiple replicas of a file and there is more than 1 replica of the
+data at offset `offset`.
+
+
+#### Preconditions
+
+Not all subclasses implement the operation operation, and instead
+either raise an exception or return `False`.
+
+    supported(FSDIS, Seekable.seekToNewSource) else raise [UnsupportedOperationException, IOException]
+
+Examples: `CompressionInputStream` , `HttpFSFileSystem`
+
+If supported, the file must be open:
+
+    isOpen(FSDIS)
+
+#### Postconditions
+
+The majority of subclasses that do not implement this operation simply
+fail.
+
+    if not supported(FSDIS, Seekable.seekToNewSource(s)):
+        result = False
+
+Examples: `RawLocalFileSystem` , `HttpFSFileSystem`
+
+If the operation is supported and there is a new location for the data:
+
+        FSDIS' = (pos, data', true)
+        result = True
+
+The new data is the original data (or an updated version of it, as covered
+in the Consistency section below), but the block containing the data at `offset`
+sourced from a different replica.
+
+If there is no other copy, `FSDIS` is  not updated; the response indicates this:
+
+        result = False
+
+Outside of test methods, the primary use of this method is in the {{FSInputChecker}}
+class, which can react to a checksum error in a read by attempting to source
+the data elsewhere. It a new source can be found it attempts to reread and
+recheck that portion of the file.
+
+## interface `PositionedReadable`
+
+The `PositionedReadable` operations provide the ability to
+read data into a buffer from a specific position in
+the data stream.
+
+Although the interface declares that it must be thread safe,
+some of the implementations do not follow this guarantee.
+
+#### Implementation preconditions
+
+Not all `FSDataInputStream` implementations support these operations. Those that do
+not implement `Seekable.seek()` do not implement the `PositionedReadable`
+interface.
+
+    supported(FSDIS, Seekable.seek) else raise [UnsupportedOperationException, IOException]
+
+This could be considered obvious: if a stream is not Seekable, a client
+cannot seek to a location. It is also a side effect of the
+base class implementation, which uses `Seekable.seek()`.
+
+
+**Implicit invariant**: for all `PositionedReadable` operations, the value
+of `pos` is unchanged at the end of the operation
+
+    pos(FSDIS') == pos(FSDIS)
+
+
+There are no guarantees that this holds *during* the operation.
+
+
+#### Failure states
+
+For any operations that fail, the contents of the destination
+`buffer` are undefined. Implementations may overwrite part
+or all of the buffer before reporting a failure.
+
+
+
+### `int PositionedReadable.read(position, buffer, offset, length)`
+
+#### Preconditions
+
+    position > 0 else raise [IllegalArgumentException, RuntimeException]
+    len(buffer) + offset < len(data) else raise [IndexOutOfBoundException, RuntimeException]
+    length >= 0
+    offset >= 0
+
+#### Postconditions
+
+The amount of data read is the less of the length or the amount
+of data available from the specified position:
+
+    let available = min(length, len(data)-position)
+    buffer'[offset..(offset+available-1)] = data[position..position+available -1]
+    result = available
+
+
+### `void PositionedReadable.readFully(position, buffer, offset, length)`
+
+#### Preconditions
+
+    position > 0 else raise [IllegalArgumentException, RuntimeException]
+    length >= 0
+    offset >= 0
+    (position + length) <= len(data) else raise [EOFException, IOException]
+    len(buffer) + offset < len(data)
+
+#### Postconditions
+
+The amount of data read is the less of the length or the amount
+of data available from the specified position:
+
+    let available = min(length, len(data)-position)
+    buffer'[offset..(offset+length-1)] = data[position..(position + length -1)]
+
+### `PositionedReadable.readFully(position, buffer)`
+
+The semantics of this are exactly equivalent to
+
+    readFully(position, buffer, 0, len(buffer))
+
+
+## Consistency
+
+* All readers, local and remote, of a data stream FSDIS provided from a `FileSystem.open(p)`
+are expected to receive access to the data of `FS.Files[p]` at the time of opening.
+* If the underlying data is changed during the read process, these changes MAY or
+MAY NOT be visible.
+* Such changes are visible MAY be partially visible.
+
+
+At time t0
+
+    FSDIS0 = FS'read(p) = (0, data0[])
+
+At time t1
+
+    FS' = FS' where FS'.Files[p] = data1
+
+From time `t >= t1`, the value of `FSDIS0` is undefined.
+
+It may be unchanged
+
+    FSDIS0.data == data0
+
+    forall l in len(FSDIS0.data):
+      FSDIS0.read() == data0[l]
+
+
+It may pick up the new data
+
+    FSDIS0.data == data1
+
+    forall l in len(FSDIS0.data):
+      FSDIS0.read() == data1[l]
+
+It may be inconsistent, such that a read of an offset returns
+data from either of the datasets
+
+    forall l in len(FSDIS0.data):
+      (FSDIS0.read(l) == data0[l]) or (FSDIS0.read(l) == data1[l]))
+
+That is, every value read may be from the original or updated file.
+
+It may also be inconsistent on repeated reads of same offset, that is
+at time `t2 > t1`:
+
+    r2 = FSDIS0.read(l)
+
+While at time `t3 > t2`:
+
+    r3 = FSDIS0.read(l)
+
+It may be that `r3 != r2`. (That is, some of the data my be cached or replicated,
+and on a subsequent read, a different version of the file's contents are returned).
+
+
+Similarly, if the data at the path `p`, is deleted, this change MAY or MAY
+not be visible during read operations performed on `FSDIS0`.

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/index.md
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/index.md?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/index.md (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/index.md Thu Jul  3 12:04:50 2014
@@ -0,0 +1,37 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# The Hadoop FileSystem API Definition
+
+This is a specification of the Hadoop FileSystem APIs, which models
+the contents of a filesystem as a set of paths that are either directories,
+symbolic links, or files.
+
+There is surprisingly little prior art in this area. There are multiple specifications of
+Unix filesystems as a tree of inodes, but nothing public which defines the
+notion of "Unix filesystem as a conceptual model for data storage access".
+
+This specification attempts to do that; to define the Hadoop FileSystem model
+and APIs so that multiple filesystems can implement the APIs and present a consistent
+model of their data to applications. It does not attempt to formally specify any of the
+concurrency behaviors of the filesystems, other than to document the behaviours exhibited by
+HDFS as these are commonly expected by Hadoop client applications.
+
+1. [Introduction](introduction.html)
+1. [Notation](notation.html)
+1. [Model](model.html)
+1. [FileSystem class](filesystem.html)
+1. [FSDataInputStream class](fsdatainputstream.html)
+2. [Testing with the Filesystem specification](testing.html)
+2. [Extending the specification and its tests](extending.html)

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md Thu Jul  3 12:04:50 2014
@@ -0,0 +1,377 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Introduction
+
+This document defines the required behaviors of a Hadoop-compatible filesystem
+for implementors and maintainers of the Hadoop filesystem, and for users of
+the Hadoop FileSystem APIs
+
+Most of the Hadoop operations are tested against HDFS in the Hadoop test
+suites, initially through `MiniDFSCluster`, before release by vendor-specific
+'production' tests, and implicitly by the Hadoop stack above it.
+
+HDFS's actions have been modeled on POSIX filesystem behavior, using the actions and
+return codes of Unix filesystem actions as a reference. Even so, there
+are places where HDFS diverges from the expected behaviour of a POSIX
+filesystem.
+
+The behaviour of other Hadoop filesystems are not as rigorously tested.
+The bundled S3 FileSystem makes Amazon's S3 Object Store ("blobstore")
+accessible through the FileSystem API. The Swift FileSystem driver provides similar
+functionality for the OpenStack Swift blobstore. The Azure object storage
+FileSystem in branch-1-win talks to Microsoft's Azure equivalent. All of these
+bind to object stores, which do have different behaviors, especially regarding
+consistency guarantees, and atomicity of operations.
+
+The "Local" FileSystem provides access to the underlying filesystem of the
+platform. Its behavior is defined by the operating system and can
+behave differently from HDFS. Examples of local filesystem quirks include
+case-sensitivity, action when attempting to rename a file atop another file,
+and whether it is possible to `seek()` past
+the end of the file.
+
+There are also filesystems implemented by third parties that assert
+compatibility with Apache Hadoop. There is no formal compatibility suite, and
+hence no way for anyone to declare compatibility except in the form of their
+own compatibility tests.
+
+These documents *do not* attempt to provide a normative definition of compatibility.
+Passing the associated test suites *does not* guarantee correct behavior of applications.
+
+What the test suites do define is the expected set of actions&mdash;failing these
+tests will highlight potential issues.
+
+By making each aspect of the contract tests configurable, it is possible to
+declare how a filesystem diverges from parts of the standard contract.
+This is information which can be conveyed to users of the filesystem.
+
+### Naming
+
+This document follows RFC 2119 rules regarding the use of MUST, MUST NOT, MAY,
+and SHALL. MUST NOT is treated as normative.
+
+## Implicit assumptions of the Hadoop FileSystem APIs
+
+The original `FileSystem` class and its usages are based on an implicit set of
+assumptions. Chiefly, that HDFS is
+the underlying FileSystem, and that it offers a subset of the behavior of a
+POSIX filesystem (or at least the implementation of the POSIX filesystem
+APIs and model provided by Linux filesystems).
+
+Irrespective of the API, it's expected that all Hadoop-compatible filesystems
+present the model of a filesystem implemented in Unix:
+
+* It's a hierarchical directory structure with files and directories.
+
+* Files contain zero or more bytes of data.
+
+* You cannot put files or directories under a file.
+
+* Directories contain zero or more files.
+
+* A directory entry has no data itself.
+
+* You can write arbitrary binary data to a file. When the file's contents
+  are read, from anywhere inside or outside of the cluster, the data is returned.
+
+* You can store many gigabytes of data in a single file.
+
+* The root directory, `"/"`, always exists, and cannot be renamed.
+
+* The root directory, `"/"`, is always a directory, and cannot be overwritten by a file write operation.
+
+* Any attempt to recursively delete the root directory will delete its contents (barring
+  lack of permissions), but will not delete the root path itself.
+
+* You cannot rename/move a directory under itself.
+
+* You cannot rename/move a directory atop any existing file other than the
+  source file itself.
+
+* Directory listings return all the data files in the directory (i.e.
+there may be hidden checksum files, but all the data files are listed).
+
+* The attributes of a file in a directory listing (e.g. owner, length) match
+ the actual attributes of a file, and are consistent with the view from an
+ opened file reference.
+
+* Security: if the caller lacks the permissions for an operation, it will fail and raise an error.
+
+### Path Names
+
+* A Path is comprised of Path elements separated by `"/"`.
+
+* A path element is a unicode string of 1 or more characters.
+
+* Path element MUST NOT include the characters `":"` or `"/"`.
+
+* Path element SHOULD NOT include characters of ASCII/UTF-8 value 0-31 .
+
+* Path element MUST NOT be `"."`  or `".."`
+
+* Note also that the Azure blob store documents say that paths SHOULD NOT use
+ a trailing `"."` (as their .NET URI class strips it).
+
+ * Paths are compared based on unicode code-points.
+
+ * Case-insensitive and locale-specific comparisons MUST NOT not be used.
+
+### Security Assumptions
+
+Except in the special section on security, this document assumes the client has
+full access to the FileSystem. Accordingly, the majority of items in the list
+do not add the qualification "assuming the user has the rights to perform the
+operation with the supplied parameters and paths".
+
+The failure modes when a user lacks security permissions are not specified.
+
+### Networking Assumptions
+
+This document assumes this all network operations succeed. All statements
+can be assumed to be qualified as *"assuming the operation does not fail due
+to a network availability problem"*
+
+* The final state of a FileSystem after a network failure is undefined.
+
+* The immediate consistency state of a FileSystem after a network failure is undefined.
+
+* If a network failure can be reported to the client, the failure MUST be an
+instance of `IOException` or subclass thereof.
+
+* The exception details SHOULD include diagnostics suitable for an experienced
+Java developer _or_ operations team to begin diagnostics. For example, source
+and destination hostnames and ports on a ConnectionRefused exception.
+
+* The exception details MAY include diagnostics suitable for inexperienced
+developers to begin diagnostics. For example Hadoop tries to include a
+reference to [ConnectionRefused](http://wiki.apache.org/hadoop/ConnectionRefused) when a TCP
+connection request is refused.
+
+<!--  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
+
+## Core Expectations of a Hadoop Compatible FileSystem
+
+Here are the core expectations of a Hadoop-compatible FileSystem.
+Some FileSystems do not meet all these expectations; as a result,
+ some programs may not work as expected.
+
+### Atomicity
+
+There are some operations that MUST be atomic. This is because they are
+often used to implement locking/exclusive access between processes in a cluster.
+
+1. Creating a file. If the `overwrite` parameter is false, the check and creation
+MUST be atomic.
+1. Deleting a file.
+1. Renaming a file.
+1. Renaming a directory.
+1. Creating a single directory with `mkdir()`.
+
+* Recursive directory deletion MAY be atomic. Although HDFS offers atomic
+recursive directory deletion, none of the other Hadoop FileSystems
+offer such a guarantee (including local FileSystems).
+
+Most other operations come with no requirements or guarantees of atomicity.
+
+
+
+### Consistency
+
+The consistency model of a Hadoop FileSystem is *one-copy-update-semantics*;
+that of a traditional local POSIX filesystem. Note that even NFS relaxes
+some constraints about how fast changes propagate.
+
+* *Create.* Once the `close()` operation on an output stream writing a newly
+created file has completed, in-cluster operations querying the file metadata
+and contents MUST immediately see the file and its data.
+
+* *Update.* Once the `close()`  operation on an output stream writing a newly
+created file has completed, in-cluster operations querying the file metadata
+and contents MUST immediately see the new data.
+
+* *Delete.* once a `delete()` operation on a path other than "/" has completed successfully,
+it MUST NOT be visible or accessible. Specifically,
+`listStatus()`, `open()` ,`rename()` and `append()`
+ operations MUST fail.
+
+* *Delete then create.* When a file is deleted then a new file of the same name created, the new file
+ MUST be immediately visible and its contents accessible via the FileSystem APIs.
+
+* *Rename.* After a `rename()`  has completed, operations against the new path MUST
+succeed; attempts to access the data against the old path MUST fail.
+
+* The consistency semantics inside of the cluster MUST be the same as outside of the cluster.
+All clients querying a file that is not being actively manipulated MUST see the
+same metadata and data irrespective of their location.
+
+### Concurrency
+
+There are no guarantees of isolated access to data: if one client is interacting
+with a remote file and another client changes that file, the changes may or may
+not be visible.
+
+### Operations and failures
+
+* All operations MUST eventually complete, successfully or unsuccessfully.
+
+* The time to complete an operation is undefined and may depend on
+the implementation and on the state of the system.
+
+* Operations MAY throw a `RuntimeException` or subclass thereof.
+
+* Operations SHOULD raise all network, remote, and high-level problems as
+an `IOException` or subclass thereof, and SHOULD NOT raise a
+`RuntimeException` for such problems.
+
+* Operations SHOULD report failures by way of raised exceptions, rather
+than specific return codes of an operation.
+
+* In the text, when an exception class is named, such as `IOException`,
+the raised exception MAY be an instance or subclass of the named exception.
+It MUST NOT be a superclass.
+
+* If an operation is not implemented in a class, the implementation must
+throw an `UnsupportedOperationException`.
+
+* Implementations MAY retry failed operations until they succeed. If they do this,
+they SHOULD do so in such a way that the *happens-before* relationship between
+any sequence of operations meets the consistency and atomicity requirements
+stated. See [HDFS-4849](https://issues.apache.org/jira/browse/HDFS-4849)
+for an example of this: HDFS does not implement any retry feature that
+could be observable by other callers.
+
+### Undefined capacity limits
+
+Here are some limits to FileSystem capacity that have never been explicitly
+defined.
+
+1. The maximum number of files in a directory.
+
+1. Max number of directories in a directory
+
+1. Maximum total number of entries (files and directories) in a filesystem.
+
+1. The maximum length of a filename under a directory (HDFS: 8000).
+
+1. `MAX_PATH` - the total length of the entire directory tree referencing a
+file. Blobstores tend to stop at ~1024 characters.
+
+1. The maximum depth of a path (HDFS: 1000 directories).
+
+1. The maximum size of a single file.
+
+### Undefined timeouts
+
+Timeouts for operations are not defined at all, including:
+
+* The maximum completion time of blocking FS operations.
+MAPREDUCE-972 documents how `distcp` broke on slow s3 renames.
+
+* The timeout for idle read streams before they are closed.
+
+* The timeout for idle write streams before they are closed.
+
+The blocking-operation timeout is in fact variable in HDFS, as sites and
+clients may tune the retry parameters so as to convert filesystem failures and
+failovers into pauses in operation. Instead there is a general assumption that
+FS operations are "fast but not as fast as local FS operations", and that the latency of data
+reads and writes scale with the volume of data. This
+assumption by client applications reveals a more fundamental one: that the filesystem is "close"
+as far as network latency and bandwidth is concerned.
+
+There are also some implicit assumptions about the overhead of some operations.
+
+1. `seek()` operations are fast and incur little or no network delays. [This
+does not hold on blob stores]
+
+1. Directory list operations are fast for directories with few entries.
+
+1. Directory list operations are fast for directories with few entries, but may
+incur a cost that is `O(entries)`. Hadoop 2 added iterative listing to
+handle the challenge of listing directories with millions of entries without
+buffering -at the cost of consistency.
+
+1. A `close()` of an `OutputStream` is fast, irrespective of whether or not
+the file operation has succeeded or not.
+
+1. The time to delete a directory is independent of the size of the number of
+child entries
+
+### Object Stores vs. Filesystems
+
+This specification refers to *Object Stores* in places, often using the
+term *Blobstore*. Hadoop does provide FileSystem client classes for some of these
+even though they violate many of the requirements. This is why, although
+Hadoop can read and write data in an object store, the two which Hadoop ships
+with direct support for &mdash;Amazon S3 and OpenStack Swift&mdash cannot
+be used as direct replacement for HDFS.
+
+*What is an Object Store?*
+
+An object store is a data storage service, usually accessed over HTTP/HTTPS.
+A `PUT` request uploads an object/"Blob"; a `GET` request retrieves it; ranged
+`GET` operations permit portions of a blob to retrieved.
+To delete the object, the HTTP `DELETE` operation is invoked.
+
+Objects are stored by name: a string, possibly with "/" symbols in them. There
+is no notion of a directory; arbitrary names can be assigned to objects &mdash;
+within the limitations of the naming scheme imposed by the service's provider.
+
+The object stores invariably provide an operation to retrieve objects with
+a given prefix; a `GET` operation on the root of the service with the
+appropriate query parameters.
+
+Object stores usually prioritize availability &mdash;there is no single point
+of failure equivalent to the HDFS NameNode(s). They also strive for simple
+non-POSIX APIs: the HTTP verbs are the operations allowed.
+
+Hadoop FileSystem clients for object stores attempt to make the
+stores pretend that they are a FileSystem, a FileSystem with the same
+features and operations as HDFS. This is &mdash;ultimately&mdash;a pretence:
+they have different characteristics and occasionally the illusion fails.
+
+1. **Consistency**. Object stores are generally *Eventually Consistent*: it
+can take time for changes to objects &mdash;creation, deletion and updates&mdash;
+to become visible to all callers. Indeed, there is no guarantee a change is
+immediately visible to the client which just made the change. As an example,
+an object `test/data1.csv` may be overwritten with a new set of data, but when
+a `GET test/data1.csv` call is made shortly after the update, the original data
+returned. Hadoop assumes that filesystems are consistent; that creation, updates
+and deletions are immediately visible, and that the results of listing a directory
+are current with respect to the files within that directory.
+
+1. **Atomicity**. Hadoop assumes that directory `rename()` operations are atomic,
+as are `delete()` operations. Object store FileSystem clients implement these
+as operations on the individual objects whose names match the directory prefix.
+As a result, the changes take place a file at a time, and are not atomic. If
+an operation fails part way through the process, the the state of the object store
+reflects the partially completed operation.  Note also that client code
+assumes that these operations are `O(1)` &mdash;in an object store they are
+more likely to be be `O(child-entries)`.
+
+1. **Durability**. Hadoop assumes that `OutputStream` implementations write data
+to their (persistent) storage on a `flush()` operation. Object store implementations
+save all their written data to a local file, a file that is then only `PUT`
+to the object store in the final `close()` operation. As a result, there is
+never any partial data from incomplete or failed operations. Furthermore,
+as the write process only starts in  `close()` operation, that operation may take
+a time proportional to the quantity of data to upload, and inversely proportional
+to the network bandwidth. It may also fail &mdash;a failure that is better
+escalated than ignored.
+
+Object stores with these characteristics, can not be used as a direct replacement
+for HDFS. In terms of this specification, their implementations of the
+specified operations do not match those required. They are considered supported
+by the Hadoop development community, but not to the same extent as HDFS.

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md Thu Jul  3 12:04:50 2014
@@ -0,0 +1,230 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# A Model of a Hadoop Filesystem
+
+
+
+#### Paths and Path Elements
+
+A Path is a list of Path elements which represents a path to a file, directory of symbolic link
+
+Path elements are non-empty strings. The exact set of valid strings MAY
+be specific to a particular FileSystem implementation.
+
+Path elements MUST NOT be in `{"", ".",  "..", "/"}`.
+
+Path elements MUST NOT contain the characters `{'/', ':'}`.
+
+Filesystems MAY have other strings that are not permitted in a path element.
+
+When validating path elements, the exception `InvalidPathException` SHOULD
+be raised when a path is invalid [HDFS]
+
+Predicate: `valid-path-element:List[String];`
+
+A path element `pe` is invalid if any character in it is in the set of forbidden characters,
+or the element as a whole is invalid
+
+    forall e in pe: not (e in {'/', ':'})
+    not pe in {"", ".",  "..", "/"}
+
+
+Predicate: `valid-path:List<PathElement>`
+
+A Path `p` is *valid* if all path elements in it are valid
+
+    def valid-path(pe): forall pe in Path: valid-path-element(pe)
+
+
+The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements.
+
+The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`.
+
+The partial function `parent(path:Path):Path` provides the parent path can be defined using
+list slicing.
+
+    def parent(pe) : pe[0:-1]
+
+Preconditions:
+
+    path != []
+
+
+#### `filename:Path->PathElement`
+
+The last Path Element in a Path is called the filename.
+
+    def filename(p) : p[-1]
+
+Preconditions:
+
+    p != []
+
+#### `childElements:(Path p, Path q):Path`
+
+
+The partial function `childElements:(Path p, Path q):Path`
+is the list of path elements in `p` that follow the path `q`.
+
+    def childElements(p, q): p[len(q):]
+
+Preconditions:
+
+
+    # The path 'q' must be at the head of the path 'p'
+    q == p[:len(q)]
+
+
+#### ancestors(Path): List[Path]
+
+The list of all paths that are either the direct parent of a path p, or a parent of
+ancestor of p.
+
+#### Notes
+
+This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably
+by declaring that the root (and only the root) path element may be ['/'].
+
+Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function
+such as `rename`.
+
+### Defining the Filesystem
+
+
+A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
+
+    (Directories:set[Path], Files:[Path:List[byte]], Symlinks:set[Path])
+
+
+Accessor functions return the specific element of a filesystem
+
+    def FS.Directories  = FS.Directories
+    def file(FS) = FS.Files
+    def symlinks(FS) = FS.Symlinks
+    def filenames(FS) = keys(FS.Files)
+
+The entire set of a paths finite subset of all possible Paths, and functions to resolve a path to data, a directory predicate or a symbolic link:
+
+    def paths(FS) = FS.Directories + filenames(FS) + FS.Symlinks)
+
+A path is deemed to exist if it is in this aggregate set:
+
+    def exists(FS, p) = p in paths(FS)
+
+The root path, "/", is a directory represented  by the path ["/"], which must always exist in a filesystem.
+
+    def isRoot(p) = p == ["/"].
+
+    forall FS in FileSystems : ["/"] in FS.Directories
+
+
+
+#### Directory references
+
+A path MAY refer to a directory in a FileSystem:
+
+    isDir(FS, p): p in FS.Directories
+
+Directories may have children, that is, there may exist other paths
+in the FileSystem whose path begins with a directory. Only directories
+may have children. This can be expressed
+by saying that every path's parent must be a directory.
+
+It can then be declared that a path has no parent in which case it is the root directory,
+or it MUST have a parent that is a directory:
+
+    forall p in paths(FS) : isRoot(p) or isDir(FS, parent(p))
+
+Because the parent directories of all directories must themselves satisfy
+this criterion, it is implicit that only leaf nodes may be files or symbolic links:
+
+Furthermore, because every filesystem contains the root path, every filesystem
+must contain at least one directory.
+
+A directory may have children:
+
+    def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
+
+There are no duplicate names in the child paths, because all paths are
+taken from the set of lists of path elements. There can be no duplicate entries
+in a set, hence no children with duplicate names.
+
+A path *D* is a descendant of a path *P* if it is the direct child of the
+path *P* or an ancestor is a direct child of path *P*:
+
+    def isDescendant(P, D) = parent(D) == P where isDescendant(P, parent(D))
+
+The descendants of a directory P are all paths in the filesystem whose
+path begins with the path P -that is their parent is P or an ancestor is P
+
+    def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
+
+
+#### File references
+
+A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
+
+    def isFile(FS, p) =  p in FS.Files
+
+
+#### Symbolic references
+
+A path MAY refer to a symbolic link:
+
+    def isSymlink(FS, p) = p in symlinks(FS)
+
+
+#### File Length
+
+The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
+
+    def length(FS, p) = if isFile(p) : return length(data(FS, p)) else return 0
+
+### User home
+
+The home directory of a user is an implicit part of a filesystem, and is derived from the userid of the
+process working with the filesystem:
+
+    def getHomeDirectory(FS) : Path
+
+The function `getHomeDirectory` returns the home directory for the Filesystem and the current user account.
+For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
+for HDFS,
+
+#### Exclusivity
+
+A path cannot refer to more than one of a file, a directory or a symbolic link
+
+
+    FS.Directories  ^ keys(data(FS)) == {}
+    FS.Directories  ^ symlinks(FS) == {}
+    keys(data(FS))(FS) ^ symlinks(FS) == {}
+
+
+This implies that only files may have data.
+
+This condition is invariant and is an implicit postcondition of all
+operations that manipulate the state of a FileSystem `FS`.
+
+### Notes
+
+Not covered: hard links in a FileSystem. If a FileSystem supports multiple
+references in *paths(FS)* to point to the same data, the outcome of operations
+are undefined.
+
+This model of a FileSystem is sufficient to describe all the FileSystem
+queries and manipulations excluding metadata and permission operations.
+The Hadoop `FileSystem` and `FileContext` interfaces can be specified
+in terms of operations that query or change the state of a FileSystem.

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md Thu Jul  3 12:04:50 2014
@@ -0,0 +1,191 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+# Notation
+
+A formal notation such as [The Z Notation](http://www.open-std.org/jtc1/sc22/open/n3187.pdf)
+would be the strictest way to define Hadoop FileSystem behavior, and could even
+be used to prove some axioms.
+
+However, it has a number of practical flaws:
+
+1. Such notations are not as widely used as they should be, so the broader software
+development community is not going to have practical experience of it.
+
+1. It's very hard to work with without dropping into tools such as LaTeX *and* add-on libraries.
+
+1. Such notations are difficult to understand, even for experts.
+
+Given that the target audience of this specification is FileSystem developers,
+formal notations are not appropriate. Instead, broad comprehensibility, ease of maintenance, and
+ease of deriving tests take priority over mathematically-pure formal notation.
+
+### Mathematics Symbols in this document
+
+This document does use a subset of [the notation in the Z syntax](http://staff.washington.edu/jon/z/glossary.html),
+but in an ASCII form and the use of Python list notation for manipulating lists and sets.
+
+* `iff` : `iff` If and only if
+* `⇒` : `implies`
+* `→` : `-->` total function
+* `↛` : `->` partial function
+
+
+* `∩` : `^`: Set Intersection
+* `∪` : `+`: Set Union
+* `\` : `-`: Set Difference
+
+* `∃` : `exists` Exists predicate
+* `∀` : `forall`: For all predicate
+* `=` : `==` Equals operator
+* `≠` : `!=` operator. In Java `z ≠ y` is written as `!( z.equals(y))` for all non-simple datatypes
+* `≡` : `equivalent-to` equivalence operator. This is stricter than equals.
+* `∅` : `{}` Empty Set. `∅ ≡ {}`
+* `≈` : `approximately-equal-to` operator
+* `¬` : `not` Not operator. In Java, `!`
+* `∄` : `does-not-exist`: Does not exist predicate. Equivalent to `not exists`
+* `∧` : `and` : local and operator. In Java , `&&`
+* `∨` : `or` : local and operator. In Java, `||`
+* `∈` : `in` : element of
+* `∉` : `not in` : not an element of
+* `⊆` : `subset-or-equal-to` the subset or equality condition
+* `⊂` : `subset-of` the proper subset condition
+* `| p |` : `len(p)` the size of a variable
+
+* `:=` : `=` :
+
+* `` : `#` :  Python-style comments
+
+* `happens-before` : `happens-before` : Lamport's ordering relationship as defined in
+[Time, Clocks and the Ordering of Events in a Distributed System](http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf)
+
+#### Sets,  Lists, Maps, and Strings
+
+The [python data structures](http://docs.python.org/2/tutorial/datastructures.html)
+are used as the basis for this syntax as it is both plain ASCII and well-known.
+
+##### Lists
+
+* A list *L* is an ordered sequence of elements `[e1, e2, ... en]`
+* The size of a list `len(L)` is the number of elements in a list.
+* Items can be addressed by a 0-based index  `e1 == L[0]`
+* Python slicing operators can address subsets of a list `L[0:3] == [e1,e2]`, `L[:-1] == en`
+* Lists can be concatenated `L' = L + [ e3 ]`
+* Lists can have entries removed `L' = L - [ e2, e1 ]`. This is different from Python's
+`del` operation, which operates on the list in place.
+* The membership predicate `in` returns true iff an element is a member of a List: `e2 in L`
+* List comprehensions can create new lists: `L' = [ x for x in l where x < 5]`
+* for a list `L`, `len(L)` returns the number of elements.
+
+
+##### Sets
+
+Sets are an extension of the List notation, adding the restrictions that there can
+be no duplicate entries in the set, and there is no defined order.
+
+* A set is an unordered collection of items surrounded by `{` and `}` braces.
+* When declaring one, the python constructor `{}` is used. This is different from Python, which uses the function `set([list])`. Here the assumption
+is that the difference between a set and a dictionary can be determined from the contents.
+* The empty set `{}` has no elements.
+* All the usual set concepts apply.
+* The membership predicate is `in`.
+* Set comprehension uses the Python list comprehension.
+`S' = {s for s in S where len(s)==2}`
+* for a set *s*, `len(s)` returns the number of elements.
+* The `-` operator returns a new set excluding all items listed in the righthand set of the operator.
+
+
+
+##### Maps
+
+Maps resemble Python dictionaries; {"key":value, "key2",value2}
+
+* `keys(Map)` represents the set of keys in a map.
+* `k in Map` holds iff `k in keys(Map)`
+* The empty map is written `{:}`
+* The `-` operator returns a new map which excludes the entry with the key specified.
+* `len(Map)` returns the number of entries in the map.
+
+##### Strings
+
+Strings are lists of characters represented in double quotes. e.g. `"abc"`
+
+    "abc" == ['a','b','c']
+
+#### State Immutability
+
+All system state declarations are immutable.
+
+The suffix "'" (single quote) is used as the convention to indicate the state of the system after a operation:
+
+    L' = L + ['d','e']
+
+
+#### Function Specifications
+
+A function is defined as a set of preconditions and a set of postconditions,
+where the postconditions define the new state of the system and the return value from the function.
+
+
+### Exceptions
+
+In classic specification languages, the preconditions define the predicates that MUST be
+satisfied else some failure condition is raised.
+
+For Hadoop, we need to be able to specify what failure condition results if a specification is not
+met (usually what exception is to be raised).
+
+The notation `raise <exception-name>` is used to indicate that an exception is to be raised.
+
+It can be used in the if-then-else sequence to define an action if a precondition is not met.
+
+Example:
+
+    if not exists(FS, Path) : raise IOException
+
+If implementations may raise any one of a set of exceptions, this is denoted by
+providing a set of exceptions:
+
+    if not exists(FS, Path) : raise {FileNotFoundException, IOException}
+
+If a set of exceptions is provided, the earlier elements
+of the set are preferred to the later entries, on the basis that they aid diagnosis of problems.
+
+We also need to distinguish predicates that MUST be satisfied, along with those that SHOULD be met.
+For this reason a function specification MAY include a section in the preconditions marked 'Should:'
+All predicates declared in this section SHOULD be met, and if there is an entry in that section
+which specifies a stricter outcome, it SHOULD BE preferred. Here is an example of a should-precondition:
+
+Should:
+
+    if not exists(FS, Path) : raise FileNotFoundException
+
+
+### Conditions
+
+There are further conditions used in precondition and postcondition declarations.
+
+
+#### `supported(instance, method)`
+
+
+This condition declares that a subclass implements the named method
+ -some subclasses of the verious FileSystem classes do not, and instead
+ raise `UnsupportedOperation`
+
+As an example, one precondition of `FSDataInputStream.seek`
+is that the implementation must support `Seekable.seek` :
+
+    supported(FDIS, Seekable.seek) else raise UnsupportedOperation

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/testing.md
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/testing.md?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/testing.md (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/testing.md Thu Jul  3 12:04:50 2014
@@ -0,0 +1,324 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Testing the Filesystem Contract
+
+## Running the tests
+
+A normal Hadoop test run will test those FileSystems that can be tested locally
+via the local filesystem. This typically means `file://` and its underlying `LocalFileSystem`, and
+`hdfs://` via the HDFS MiniCluster.
+
+Other filesystems are skipped unless there is a specific configuration to the
+remote server providing the filesystem.
+
+
+These filesystem bindings must be defined in an XML configuration file, usually
+`hadoop-common-project/hadoop-common/src/test/resources/contract-test-options.xml`.
+This file is excluded should not be checked in.
+
+### s3://
+
+In `contract-test-options.xml`, the filesystem name must be defined in the property `fs.contract.test.fs.s3`. The standard configuration options to define the S3 authentication details must also be provided.
+
+Example:
+
+    <configuration>
+      <property>
+        <name>fs.contract.test.fs.s3</name>
+        <value>s3://tests3hdfs/</value>
+      </property>
+
+      <property>
+        <name>fs.s3.awsAccessKeyId</name>
+        <value>DONOTPCOMMITTHISKEYTOSCM</value>
+      </property>
+
+      <property>
+        <name>fs.s3.awsSecretAccessKey</name>
+        <value>DONOTEVERSHARETHISSECRETKEY!</value>
+      </property>
+    </configuration>
+
+### s3n://
+
+
+In `contract-test-options.xml`, the filesystem name must be defined in the property `fs.contract.test.fs.s3n`. The standard configuration options to define the S3N authentication details muse also be provided.
+
+Example:
+
+
+    <configuration>
+      <property>
+        <name>fs.contract.test.fs.s3n</name>
+        <value>s3n://tests3contract</value>
+      </property>
+
+      <property>
+        <name>fs.s3n.awsAccessKeyId</name>
+        <value>DONOTPCOMMITTHISKEYTOSCM</value>
+      </property>
+
+      <property>
+        <name>fs.s3n.awsSecretAccessKey</name>
+        <value>DONOTEVERSHARETHISSECRETKEY!</value>
+      </property>
+
+### ftp://
+
+
+In `contract-test-options.xml`, the filesystem name must be defined in
+the property `fs.contract.test.fs.ftp`. The specific login options to
+connect to the FTP Server must then be provided.
+
+A path to a test directory must also be provided in the option
+`fs.contract.test.ftp.testdir`. This is the directory under which
+operations take place.
+
+Example:
+
+
+    <configuration>
+      <property>
+        <name>fs.contract.test.fs.ftp</name>
+        <value>ftp://server1/</value>
+      </property>
+
+      <property>
+        <name>fs.ftp.user.server1</name>
+        <value>testuser</value>
+      </property>
+
+      <property>
+        <name>fs.contract.test.ftp.testdir</name>
+        <value>/home/testuser/test</value>
+      </property>
+
+      <property>
+        <name>fs.ftp.password.server1</name>
+        <value>secret-login</value>
+      </property>
+    </configuration>
+
+
+### swift://
+
+The OpenStack Swift login details must be defined in the file
+`/hadoop-tools/hadoop-openstack/src/test/resources/contract-test-options.xml`.
+The standard hadoop-common `contract-test-options.xml` resource file cannot be
+used, as that file does not get included in `hadoop-common-test.jar`.
+
+
+In `/hadoop-tools/hadoop-openstack/src/test/resources/contract-test-options.xml`
+the Swift bucket name must be defined in the property `fs.contract.test.fs.swift`,
+along with the login details for the specific Swift service provider in which the
+bucket is posted.
+
+    <configuration>
+      <property>
+        <name>fs.contract.test.fs.swift</name>
+        <value>swift://swiftbucket.rackspace/</value>
+      </property>
+
+      <property>
+        <name>fs.swift.service.rackspace.auth.url</name>
+        <value>https://auth.api.rackspacecloud.com/v2.0/tokens</value>
+        <description>Rackspace US (multiregion)</description>
+      </property>
+
+      <property>
+        <name>fs.swift.service.rackspace.username</name>
+        <value>this-is-your-username</value>
+      </property>
+
+      <property>
+        <name>fs.swift.service.rackspace.region</name>
+        <value>DFW</value>
+      </property>
+
+      <property>
+        <name>fs.swift.service.rackspace.apikey</name>
+        <value>ab0bceyoursecretapikeyffef</value>
+      </property>
+
+    </configuration>
+
+1. Often the different public cloud Swift infrastructures exhibit different behaviors
+(authentication and throttling in particular). We recommand that testers create
+accounts on as many of these providers as possible and test against each of them.
+1. They can be slow, especially remotely. Remote links are also the most likely
+to make eventual-consistency behaviors visible, which is a mixed benefit.
+
+## Testing a new filesystem
+
+The core of adding a new FileSystem to the contract tests is adding a
+new contract class, then creating a new non-abstract test class for every test
+suite that you wish to test.
+
+1. Do not try and add these tests into Hadoop itself. They won't be added to
+the soutce tree. The tests must live with your own filesystem source.
+1. Create a package in your own test source tree (usually) under `contract`,
+for the files and tests.
+1. Subclass `AbstractFSContract` for your own contract implementation.
+1. For every test suite you plan to support create a non-abstract subclass,
+ with the name starting with `Test` and the name of the filesystem.
+ Example: `TestHDFSRenameContract`.
+1. These non-abstract classes must implement the abstract method
+ `createContract()`.
+1. Identify and document any filesystem bindings that must be defined in a
+ `src/test/resources/contract-test-options.xml` file of the specific project.
+1. Run the tests until they work.
+
+
+As an example, here is the implementation of the test of the `create()` tests for the local filesystem.
+
+    package org.apache.hadoop.fs.contract.localfs;
+
+    import org.apache.hadoop.conf.Configuration;
+    import org.apache.hadoop.fs.contract.AbstractCreateContractTest;
+    import org.apache.hadoop.fs.contract.AbstractFSContract;
+
+    public class TestLocalCreateContract extends AbstractCreateContractTest {
+      @Override
+      protected AbstractFSContract createContract(Configuration conf) {
+        return new LocalFSContract(conf);
+      }
+    }
+
+The standard implementation technique for subclasses of `AbstractFSContract` is to be driven entirely by a Hadoop XML configuration file stored in the test resource tree. The best practise is to store it under `/contract` with the name of the FileSystem, such as `contract/localfs.xml`. Having the XML file define all FileSystem options makes the listing of FileSystem behaviors immediately visible.
+
+The `LocalFSContract` is a special case of this, as it must adjust its case sensitivity policy based on the OS on which it is running: for both Windows and OS/X, the filesystem is case insensitive, so the `ContractOptions.IS_CASE_SENSITIVE` option must be set to false. Furthermore, the Windows filesystem does not support Unix file and directory permissions, so the relevant flag must also be set. This is done *after* loading the XML contract file from the resource tree, simply by updating the now-loaded configuration options:
+
+      getConf().setBoolean(getConfKey(ContractOptions.SUPPORTS_UNIX_PERMISSIONS), false);
+
+
+
+### Handling test failures
+
+If your new `FileSystem` test cases fails one of the contract tests, what you can you do?
+
+It depends on the cause of the problem
+
+1. Case: custom `FileSystem` subclass class doesn't correctly implement specification. Fix.
+1. Case: Underlying filesystem doesn't behave in a way that matches Hadoop's expectations. Ideally, fix. Or try to make your `FileSystem` subclass hide the differences, e.g. by translating exceptions.
+1. Case: fundamental architectural differences between your filesystem and Hadoop. Example: different concurrency and consistency model. Recommendation: document and make clear that the filesystem is not compatible with HDFS.
+1. Case: test does not match the specification. Fix: patch test, submit the patch to Hadoop.
+1. Case: specification incorrect. The underlying specification is (with a few exceptions) HDFS. If the specification does not match HDFS, HDFS should normally be assumed to be the real definition of what a FileSystem should do. If there's a mismatch, please raise it on the `hdfs-dev` mailing list. Note that while FileSystem tests live in the core Hadoop codebase, it is the HDFS team who owns the FileSystem specification and the tests that accompany it.
+
+If a test needs to be skipped because a feature is not supported, look for a existing configuration option in the `ContractOptions` class. If there is no method, the short term fix is to override the method and use the `ContractTestUtils.skip()` message to log the fact that a test is skipped. Using this method prints the message to the logs, then tells the test runner that the test was skipped. This highlights the problem.
+
+A recommended strategy is to call the superclass, catch the exception, and verify that the exception class and part of the error string matches that raised by the current implementation. It should also `fail()` if superclass actually succeeded -that is it failed the way that the implemention does not currently do.  This will ensure that the test path is still executed, any other failure of the test -possibly a regression- is picked up. And, if the feature does become implemented, that the change is picked up.
+
+A long-term solution is to enhance the base test to add a new optional feature key. This will require collaboration with the developers on the `hdfs-dev` mailing list.
+
+
+
+### 'Lax vs Strict' exceptions
+
+The contract tests include the notion of strict vs lax exceptions. *Strict* exception reporting means: reports failures using specific subclasses of `IOException`, such as `FileNotFoundException`, `EOFException` and so on. *Lax* reporting means throws `IOException`.
+
+While FileSystems SHOULD raise stricter exceptions, there may be reasons why they cannot. Raising lax exceptions is still allowed, it merely hampers diagnostics of failures in user applications. To declare that a FileSystem does not support the stricter exceptions, set the option `fs.contract.supports-strict-exceptions` to false.
+
+### Supporting FileSystems with login and authentication parameters
+
+Tests against remote FileSystems will require the URL to the FileSystem to be specified;
+tests against remote FileSystems that require login details require usernames/IDs and passwords.
+
+All these details MUST be required to be placed in the file `src/test/resources/contract-test-options.xml`, and your SCM tools configured to never commit this file to subversion, git or
+equivalent. Furthermore, the build MUST be configured to never bundle this file in any `-test` artifacts generated. The Hadoop build does this, excluding `src/test/**/*.xml` from the JAR files.
+
+The `AbstractFSContract` class automatically loads this resource file if present; specific keys for specific test cases can be added.
+
+As an example, here are what S3N test keys look like:
+
+    <configuration>
+      <property>
+        <name>fs.contract.test.fs.s3n</name>
+        <value>s3n://tests3contract</value>
+      </property>
+
+      <property>
+        <name>fs.s3n.awsAccessKeyId</name>
+        <value>DONOTPCOMMITTHISKEYTOSCM</value>
+      </property>
+
+      <property>
+        <name>fs.s3n.awsSecretAccessKey</name>
+        <value>DONOTEVERSHARETHISSECRETKEY!</value>
+      </property>
+    </configuration>
+
+The `AbstractBondedFSContract` automatically skips a test suite if the FileSystem URL is not defined in the property `fs.contract.test.fs.%s`, where `%s` matches the schema name of the FileSystem.
+
+
+
+### Important: passing the tests does not guarantee compatibility
+
+Passing all the FileSystem contract tests does not mean that a filesystem can be described as "compatible with HDFS". The tests try to look at the isolated functionality of each operation, and focus on the preconditions and postconditions of each action. Core areas not covered are concurrency and aspects of failure across a distributed system.
+
+* Consistency: are all changes immediately visible?
+* Atomicity: are operations which HDFS guarantees to be atomic equally so on the new filesystem.
+* Idempotency: if the filesystem implements any retry policy, is idempotent even while other clients manipulate the filesystem?
+* Scalability: does it support files as large as HDFS, or as many in a single directory?
+* Durability: do files actually last -and how long for?
+
+Proof that this is is true is the fact that the Amazon S3 and OpenStack Swift object stores are eventually consistent object stores with non-atomic rename and delete operations. Single threaded test cases are unlikely to see some of the concurrency issues, while consistency is very often only visible in tests that span a datacenter.
+
+There are also some specific aspects of the use of the FileSystem API:
+
+* Compatibility with the `hadoop -fs` CLI.
+* Whether the blocksize policy produces file splits that are suitable for analytics workss. (as an example, a blocksize of 1 matches the specification, but as it tells MapReduce jobs to work a byte at a time, unusable).
+
+Tests that verify these behaviors are of course welcome.
+
+
+
+## Adding a new test suite
+
+1. New tests should be split up with a test class per operation, as is done for `seek()`, `rename()`, `create()`, and so on. This is to match up the way that the FileSystem contract specification is split up by operation. It also makes it easier for FileSystem implementors to work on one test suite at a time.
+2. Subclass `AbstractFSContractTestBase` with a new abstract test suite class. Again, use `Abstract` in the title.
+3. Look at `org.apache.hadoop.fs.contract.ContractTestUtils` for utility classes to aid testing, with lots of filesystem-centric assertions. Use these to make assertions about the filesystem state, and to incude diagnostics information such as directory listings and dumps of mismatched files when an assertion actually fails.
+4. Write tests for the local, raw local and HDFS filesystems -if one of these fails the tests then there is a sign of a problem -though be aware that they do have differnces
+5. Test on the object stores once the core filesystems are passing the tests.
+4. Try and log failures with as much detail as you can -the people debugging the failures will appreciate it.
+
+
+### Root manipulation tests
+
+Some tests work directly against the root filesystem, attempting to do things like rename "/" and similar actions. The root directory is "special", and it's important to test this, especially on non-POSIX filesystems such as object stores. These tests are potentially very destructive to native filesystems, so use care.
+
+1. Add the tests under `AbstractRootDirectoryContractTest` or create a new test with (a) `Root` in the title and (b) a check in the setup method to skip the test if root tests are disabled:
+
+          skipIfUnsupported(TEST_ROOT_TESTS_ENABLED);
+
+1. Don't provide an implementation of this test suite to run against the local FS.
+
+### Scalability tests
+
+Tests designed to generate scalable load -and that includes a large number of small files, as well as fewer larger files, should be designed to be configurable, so that users of the test
+suite can configure the number and size of files.
+
+Be aware that on object stores, the directory rename operation is usually `O(files)*O(data)` while the delete operation is `O(files)`. The latter means even any directory cleanup operations may take time and can potentially timeout. It is important to design tests that work against remote filesystems with possible delays in all operations.
+
+## Extending the specification
+
+The specification is incomplete. It doesn't have complete coverage of the FileSystem classes, and there may be bits of the existing specified classes that are not covered.
+
+1. Look at the implementations of a class/interface/method to see what they do, especially HDFS and local. These are the documentation of what is done today.
+2. Look at the POSIX API specification.
+3. Search through the HDFS JIRAs for discussions on FileSystem topics, and try to understand what was meant to happen, as well as what does happen.
+4. Use an IDE to find out how methods are used in Hadoop, HBase and other parts of the stack. Although this assumes that these are representative Hadoop applications, it will at least show how applications *expect* a FileSystem to behave.
+5. Look in the java.io source to see how the bunded FileSystem classes are expected to behave -and read their javadocs carefully.
+5. If something is unclear -as on the hdfs-dev list.
+6. Don't be afraid to write tests to act as experiments and clarify what actually happens. Use the HDFS behaviours as the normative guide.

Modified: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFileSystem.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFileSystem.java?rev=1607596&r1=1607595&r2=1607596&view=diff
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFileSystem.java (original)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFileSystem.java Thu Jul  3 12:04:50 2014
@@ -227,7 +227,7 @@ public class TestLocalFileSystem {
     try {
       fileSys.mkdirs(bad_dir);
       fail("Failed to detect existing file in path");
-    } catch (FileAlreadyExistsException e) { 
+    } catch (ParentNotDirectoryException e) {
       // Expected
     }
     

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractBondedFSContract.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractBondedFSContract.java?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractBondedFSContract.java (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractBondedFSContract.java Thu Jul  3 12:04:50 2014
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.contract;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+
+/**
+ * This is a filesystem contract for any class that bonds to a filesystem
+ * through the configuration.
+ *
+ * It looks for a definition of the test filesystem with the key
+ * derived from "fs.contract.test.fs.%s" -if found the value
+ * is converted to a URI and used to create a filesystem. If not -the
+ * tests are not enabled
+ */
+public abstract class AbstractBondedFSContract extends AbstractFSContract {
+
+  private static final Log LOG =
+      LogFactory.getLog(AbstractBondedFSContract.class);
+
+  /**
+   * Pattern for the option for test filesystems from schema
+   */
+  public static final String FSNAME_OPTION = "test.fs.%s";
+
+  /**
+   * Constructor: loads the authentication keys if found
+
+   * @param conf configuration to work with
+   */
+  protected AbstractBondedFSContract(Configuration conf) {
+    super(conf);
+  }
+
+  private String fsName;
+  private URI fsURI;
+  private FileSystem filesystem;
+
+  @Override
+  public void init() throws IOException {
+    super.init();
+    //this test is only enabled if the test FS is present
+    fsName = loadFilesystemName(getScheme());
+    setEnabled(!fsName.isEmpty());
+    if (isEnabled()) {
+      try {
+        fsURI = new URI(fsName);
+        filesystem = FileSystem.get(fsURI, getConf());
+      } catch (URISyntaxException e) {
+        throw new IOException("Invalid URI " + fsName);
+      } catch (IllegalArgumentException e) {
+        throw new IOException("Invalid URI " + fsName, e);
+      }
+    } else {
+      LOG.info("skipping tests as FS name is not defined in "
+              + getFilesystemConfKey());
+    }
+  }
+
+  /**
+   * Load the name of a test filesystem.
+   * @param schema schema to look up
+   * @return the filesystem name -or "" if none was defined
+   */
+  public String loadFilesystemName(String schema) {
+    return getOption(String.format(FSNAME_OPTION, schema), "");
+  }
+
+  /**
+   * Get the conf key for a filesystem
+   */
+  protected String getFilesystemConfKey() {
+    return getConfKey(String.format(FSNAME_OPTION, getScheme()));
+  }
+
+  @Override
+  public FileSystem getTestFileSystem() throws IOException {
+    return filesystem;
+  }
+
+  @Override
+  public Path getTestPath() {
+    Path path = new Path("/test");
+    return path;
+  }
+
+  @Override
+  public String toString() {
+    return getScheme() +" Contract against " + fsName;
+  }
+}

Propchange: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractBondedFSContract.java
------------------------------------------------------------------------------
    svn:eol-style = native

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractAppendTest.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractAppendTest.java?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractAppendTest.java (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractAppendTest.java Thu Jul  3 12:04:50 2014
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.contract;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.hadoop.fs.contract.ContractTestUtils.cleanup;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.createFile;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.touch;
+
+/**
+ * Test concat -if supported
+ */
+public abstract class AbstractContractAppendTest extends AbstractFSContractTestBase {
+  private static final Logger LOG =
+      LoggerFactory.getLogger(AbstractContractAppendTest.class);
+
+  private Path testPath;
+  private Path target;
+
+  @Override
+  public void setup() throws Exception {
+    super.setup();
+    skipIfUnsupported(SUPPORTS_APPEND);
+
+    //delete the test directory
+    testPath = path("test");
+    target = new Path(testPath, "target");
+  }
+
+  @Test
+  public void testAppendToEmptyFile() throws Throwable {
+    touch(getFileSystem(), target);
+    byte[] dataset = dataset(256, 'a', 'z');
+    FSDataOutputStream outputStream = getFileSystem().append(target);
+    try {
+      outputStream.write(dataset);
+    } finally {
+      outputStream.close();
+    }
+    byte[] bytes = ContractTestUtils.readDataset(getFileSystem(), target,
+                                                 dataset.length);
+    ContractTestUtils.compareByteArrays(dataset, bytes, dataset.length);
+  }
+
+  @Test
+  public void testAppendNonexistentFile() throws Throwable {
+    try {
+      FSDataOutputStream out = getFileSystem().append(target);
+      //got here: trouble
+      out.close();
+      fail("expected a failure");
+    } catch (Exception e) {
+      //expected
+      handleExpectedException(e);
+    }
+  }
+
+  @Test
+  public void testAppendToExistingFile() throws Throwable {
+    byte[] original = dataset(8192, 'A', 'Z');
+    byte[] appended = dataset(8192, '0', '9');
+    createFile(getFileSystem(), target, false, original);
+    FSDataOutputStream outputStream = getFileSystem().append(target);
+      outputStream.write(appended);
+      outputStream.close();
+    byte[] bytes = ContractTestUtils.readDataset(getFileSystem(), target,
+                                                 original.length + appended.length);
+    ContractTestUtils.validateFileContent(bytes,
+            new byte[] [] { original, appended });
+  }
+
+  @Test
+  public void testAppendMissingTarget() throws Throwable {
+    try {
+      FSDataOutputStream out = getFileSystem().append(target);
+      //got here: trouble
+      out.close();
+      fail("expected a failure");
+    } catch (Exception e) {
+      //expected
+      handleExpectedException(e);
+    }
+  }
+
+  @Test
+  public void testRenameFileBeingAppended() throws Throwable {
+    touch(getFileSystem(), target);
+    assertPathExists("original file does not exist", target);
+    byte[] dataset = dataset(256, 'a', 'z');
+    FSDataOutputStream outputStream = getFileSystem().append(target);
+    outputStream.write(dataset);
+    Path renamed = new Path(testPath, "renamed");
+    outputStream.close();
+    String listing = ls(testPath);
+
+    //expected: the stream goes to the file that was being renamed, not
+    //the original path
+    assertPathExists("renamed destination file does not exist", renamed);
+
+    assertPathDoesNotExist("Source file found after rename during append:\n" +
+                           listing, target);
+    byte[] bytes = ContractTestUtils.readDataset(getFileSystem(), renamed,
+                                                 dataset.length);
+    ContractTestUtils.compareByteArrays(dataset, bytes, dataset.length);
+  }
+}

Propchange: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractAppendTest.java
------------------------------------------------------------------------------
    svn:eol-style = native

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractConcatTest.java
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractConcatTest.java?rev=1607596&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractConcatTest.java (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractConcatTest.java Thu Jul  3 12:04:50 2014
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.contract;
+
+import org.apache.hadoop.fs.Path;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.hadoop.fs.contract.ContractTestUtils.assertFileHasLength;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.cleanup;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.createFile;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.touch;
+
+/**
+ * Test concat -if supported
+ */
+public abstract class AbstractContractConcatTest extends AbstractFSContractTestBase {
+  private static final Logger LOG =
+      LoggerFactory.getLogger(AbstractContractConcatTest.class);
+
+  private Path testPath;
+  private Path srcFile;
+  private Path zeroByteFile;
+  private Path target;
+
+  @Override
+  public void setup() throws Exception {
+    super.setup();
+    skipIfUnsupported(SUPPORTS_CONCAT);
+
+    //delete the test directory
+    testPath = path("test");
+    srcFile = new Path(testPath, "small.txt");
+    zeroByteFile = new Path(testPath, "zero.txt");
+    target = new Path(testPath, "target");
+
+    byte[] block = dataset(TEST_FILE_LEN, 0, 255);
+    createFile(getFileSystem(), srcFile, false, block);
+    touch(getFileSystem(), zeroByteFile);
+  }
+
+  @Test
+  public void testConcatEmptyFiles() throws Throwable {
+    touch(getFileSystem(), target);
+    try {
+      getFileSystem().concat(target, new Path[0]);
+      fail("expected a failure");
+    } catch (Exception e) {
+      //expected
+      handleExpectedException(e);
+    }
+  }
+
+  @Test
+  public void testConcatMissingTarget() throws Throwable {
+    try {
+      getFileSystem().concat(target,
+                             new Path[] { zeroByteFile});
+      fail("expected a failure");
+    } catch (Exception e) {
+      //expected
+      handleExpectedException(e);
+    }
+  }
+
+  @Test
+  public void testConcatFileOnFile() throws Throwable {
+    byte[] block = dataset(TEST_FILE_LEN, 0, 255);
+    createFile(getFileSystem(), target, false, block);
+    getFileSystem().concat(target,
+                           new Path[] {srcFile});
+    assertFileHasLength(getFileSystem(), target, TEST_FILE_LEN *2);
+    ContractTestUtils.validateFileContent(
+      ContractTestUtils.readDataset(getFileSystem(),
+                                    target, TEST_FILE_LEN * 2),
+      new byte[][]{block, block});
+  }
+
+  @Test
+  public void testConcatOnSelf() throws Throwable {
+    byte[] block = dataset(TEST_FILE_LEN, 0, 255);
+    createFile(getFileSystem(), target, false, block);
+    try {
+      getFileSystem().concat(target,
+                             new Path[]{target});
+    } catch (Exception e) {
+      //expected
+      handleExpectedException(e);
+    }
+  }
+
+
+
+}

Propchange: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractConcatTest.java
------------------------------------------------------------------------------
    svn:eol-style = native



Mime
View raw message