hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.
Date Mon, 05 Jan 2009 20:19:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660900#action_12660900
] 

Sanjay Radia commented on HADOOP-4952:
--------------------------------------


> Also, create() has too many parameters and is fragile.

The proposed Files interface significantly reduces the number of create methods. It accomplishes
this partly by
allowing one to pass in a -1 to certain parameters (blk size, replication factor, buf size)
to use their default values 

Thus the first create method is very simple and robust for type checking:
{code}
  public static FSDataOutputStream create(Path f,
                                    FsPermission permission,
                                    boolean overwrite) throws IOException 
{code}

The second create methods is fragile since many of the parameters are ints and one can easily
mix the parameters.

Given that most app writers will use the above simpler method, I am not as worried as I used
to be.


Were you thinking of something like:
{code}
// Style 1
createdFile = create(path,  permission,  overwrite)
createFile.setBlockSize(512); // optional most are okay with defaults
 outputStream = createFile.open();

// Style 2 
 outputStream = create(path,  permission,  overwrite); // the common case where the defaults
are fine

createParms.setBlockSize(512);                       // for when the defaults are not fine.
outputStream = create(path,  permission,  overwrite, createParms);

// For comparison, the proposed patch would have required the following 
outputStream = create(path, permissions, override); // common case when defaults are fine
outputStream = create(path, permissions, override, 512, -1, -1, progress); // when you want
to override the defaults

{code}





> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: Files.java
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, etc)
> This Jira proposes that we provide a simpler interfaces for the application writer and
leave the FilsSystem  interface for the implementer of a filesystem.
> - Filesystem interface  has a  confusing set of methods for the application writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and then access
that name space. It is consistent for the FileSystem instance to not accept URIs for other
schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the src or target
can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -	The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common usage pattern,
one should not need to always pass the config as a parameter when accessing the file system.
 
> -	
> ** It does not handle multiple file systems too well. Today a site.xml is derived from
a single Hadoop cluster. This does not make sense for multiple Hadoop clusters which may have
different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults from the
target file system, rather then the client size config.  I am not suggesting we don't allow
setting client side defaults, but most clients do not care and would find it simpler to take
the defaults for their systems  from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message