hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14365) Stabilise FileSystem builder-based create API
Date Mon, 01 May 2017 13:26:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990799#comment-15990799

Steve Loughran commented on HADOOP-14365:

Also, I want some generic setOption calls which can let me set bool, long, int
builder = fs.createFile("s3a://stevel/datasets/set1")
builder.setOption("s3a:encryption", true)
builder.setOption("s3a:encryption.kms.key", "AAZIF")
builder.setOption("s3a:acls", "aclinfo1", "aclInfo2", "aclInfo3")

Today we can only set those options in an FS by FS basis; indeed, it's only since HADOOP-13336
that we've had per-bucket config. Making it per file would be one more step change.

If we can set options this way, there's no need to have separate methods for every feature
which is added. Equally critically, it stops me having to cast the FS into the FS client which
I require to set an option. For example to play with favored nodes in HFDS I have to 

FileSystem fs =  FileSystem.getDefaultFS(conf, destPath);
FSDataOutputStream out;
if (fs instanceof  DistributedFileSystem) {
   dfs = (DistributedFileSystem) fs;
   builder = dfs.newFSDataOutputStreamBuilder(destPath)
   out= builder.build();
} else {
  out = fs.newFSDataOutputStreamBuilder(destPath).build

It just gets too convoluted fast, especially if there's options for Azure different from S3A
from HDFS.
Even worse: if we did add object-store specific builders, you'd need them on the CP before
your code can use them. Maybe you can get away with that assumption for HDFS, but we can't
for the others, especially when there are some (google GCS) which aren't even in the Hadoop

I really like this idea; if it works we could think of adding an openFile() operation to be
similar; let us set fadvise = random option, retry policy, etc, etc.

> Stabilise FileSystem builder-based create API 
> ----------------------------------------------
>                 Key: HADOOP-14365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14365
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Priority: Blocker
> HDFS-11170 added a builder-based create API for file creation which has a few issues
to work out before it can be considered ready for use
> 1. There no specification in the filesystem.md of what it is meant to do, which means
there's no public documentation on expected behaviour except on the Javadocs, which consists
of the sentences "Create a new FSDataOutputStreamBuilder for the file with path" and "Base
of specific file system FSDataOutputStreamBuilder".
> I propose:
> # Give the new method a relevant name rather than just define the return type, e.g. {{createFile()}}.

> # `Filesystem.md` to be extended with coverage of this method, and, sadly for the authors,
coverage of what the semantics of {{FSDataOutputStreamBuilder.build()}} are.
> 2. There are only tests for HDFS and local, neither of them perfect. Proposed: move to
{{AbstractContractCreateTest}}, test for all filesystems, fix tests and FS where appropriate.

> 3. Add more tests to generate the failure conditions implied by the updated filesystem
spec. Eg. create over a an existing file, create over a directory, create with negative buffer
size, negative block size, empty dest path, etc, etc. 
> This will clarify when precondition checks are made, as well as whether. For example:
should {{newFSDataOutputStreamBuilder()}} validate the path immediately?
> 4. Add to {{FileContext}}.
> 5. Take the opportunity to look at the flaws in today's {{create()}} calls and address
them, rather than replicate. In particular, I'd like to end the behaviour "create all parent

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message