hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3598) WebHDFS: support file concat
Date Fri, 25 Jan 2013 23:23:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563153#comment-13563153
] 

Konstantin Shvachko commented on HDFS-3598:
-------------------------------------------

Let me summarise the ideas expressed here.
The requirement is to have a common interface for WebHDFS and DistributedFileSystem. So that
any application written for HDFS (DistributedFileSystem) would work for WebHDFS as well by
just replacing the URI schema.
{code}
hadoop fs -ls hdfs://nn1/user/shv
hadoop fs -ls webhdfs://nn2/user/shv
{code}
should work the same.
So far the common interface was FileSystem. So in order to satisfy the above req. concat()
should be added to FileSystem. Then all other subclasses of FileSystem need to implement concat().
Per Harsh's observation LocalFileSystem in particular will have to implement concat() and
the LFS implementation will work on arbitrary size files as opposed to DFS.

Is this confusing? Well, yes and no. Yes, because the behaviour is different, indeed. No,
because implementations in different file systems can differ in their restrictions and semantics.
We already have that in LFS with getFileBlockLocations() returning "localhost:50010", which
could have as well been "IDontKnow" or "OverTheHills:ThroughTheWoods".

A new API as Nicholas proposes (HadoopDistributedFileSystem) would work, but it will separate
WebHDFS and DFS from other files systems. I am guessing HttpFS will also want to extend that,
then LFS will follow the suite, and we will end up in the same place, having  HadoopDistributedFileSystem
essentially equivalent to FileSystem.

I think we should add concat() to the FileSystem. People start using it, and somebody might
try implementing full concatenation. Exposing restricted API may be beneficial in this case
as opposed to hiding it. Especially since the restricted version is pretty powerful by itself.
Does it make sense for you guys?
                
> WebHDFS: support file concat
> ----------------------------
>
>                 Key: HDFS-3598
>                 URL: https://issues.apache.org/jira/browse/HDFS-3598
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Plamen Jeliazkov
>
> In trunk and branch-2, DistributedFileSystem has a new concat(Path trg, Path [] psrcs)
method.  WebHDFS should support it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message