hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3307) Archives in Hadoop.
Date Sat, 26 Apr 2008 08:03:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592557#action_12592557

Joydeep Sen Sarma commented on HADOOP-3307:

if 'har' is truly a client side abstraction - then the assumption that the protocol is hdfs
- breaks this abstraction - no? one could imagine har archives on top of local file system
- or for that matter KFS or any other future file system (say Lustre?).

also - the 'har' protocol is redundantly indicated in the uri scheme as well as the file extension.
conceivably - one could drop it from the uri scheme (and thereby retain the ability to work
with different file systems) and use the presence of the .har extension in the file path to
automatically layer on a archive file system.

if done right - one should be able to support any archive format no? essentially - we are
just associating the .har extension as a trigger to switch over to some nested file system
(in this case, the har file system). one would think that in future a .zip extension could
be associated with a ZIP file system provider which would allow nested view of the files/directories
underneath .. (this would be, quite nice, since many data sets float around as zip files.
one could just copy them into hdfs - and pronto - we are all set).

am also curious about the 'parallel creation' aspect (since that seems to be the main argument
for using a new archive format). how do we populate a single hdfs file (backing the archive)
in parallel? 

> Archives in Hadoop.
> -------------------
>                 Key: HADOOP-3307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3307
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>             Fix For: 0.18.0
> This is a new feature for archiving and unarchiving files in HDFS. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message