hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3307) Archives in Hadoop.
Date Tue, 29 Apr 2008 18:11:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593070#action_12593070

Doug Cutting commented on HADOOP-3307:

> the intent is to change path to make it work.... 

Would you special case the handling of "har:" uri's in Path?  Or would you always parse queries
as part of the hierarchical path?  Both of these sound like bad ideas to me.

We should not add special functionality to FileSystem or Path for "har:" uris.  We have a
proposal that layers cleanly on top of the existing FileSystem and Path implementations. 
Alternately, we might consider generic extensions to FileSystem and/or Path, like symbolic
links or mount points, to see whether these might facilitate a more transparent archive implementation.
 But we should not add special-purpose hacks for a particular archive format to these generic

Mounts of various sorts would be fairly easy to add, but perhaps not that easy to use.  I
proposed a simple version above that requires no changes to existing code.  A mount capability
that permitted one to attach a FileSystem implementation at an arbitrary point in the URI
space would not be overly hard to add.

The primary downside of mount-based approaches is that they require state.  One would have
to add something to the configuration or job for each mount point, or require all FileSystem
implementations to know how to store a mount, or add a mount file type, or somesuch.  Note
that this is not a problem with Unix mount, since there's only one system involved, but in
a distributed system like Hadoop we need to either transmit the mount points with code (e.g.,
in the job) or somehow store them in the filesystem.

The current proposal, embedding the URI of the archive within a "har:" uri, will both solve
the problems at hand and require no architectural changes to the filesystem.  The only downside
is that archive file naming is a little obtuse.  Long-term, the addition of symbolic links
to FileSystem might address that, no?

> Archives in Hadoop.
> -------------------
>                 Key: HADOOP-3307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3307
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>             Fix For: 0.18.0
> This is a new feature for archiving and unarchiving files in HDFS. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message