hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3307) Archives in Hadoop.
Date Mon, 02 Jun 2008 14:25:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601632#action_12601632

Devaraj Das commented on HADOOP-3307:

1) The query part in the creation of the URI can be removed (in fact we probably should flag
an error if the har path contains a '?' since it is not a valid Path)
2) decodeURI should be done first and then the har archive path can be extracted
3) getHarAuth needn't be parsing the uri everytime since it is constant. The auth can just
be stored in a class variable.
4) open() & other filesystem calls should support taking just the fragment path to a file
within the archive
5) why is fileStatusInIndex storing the Store object in a list while going through the master
index? Isn't the list going to be always of size 1 (if the file is present in the archive)
6) The index files are not closed in the fileStatusInIndex call. This might lead to problems
in the cases where the underlying filesystem is the localfs (where open actually returns a
filedescriptor). But I am also not sure whether we should open and close on every call to
fileStatusInIndex. Can we somehow cache the handles to the index files and reuse them.
7) When we create a part file, can we record the things like replication factor, permissions,
etc. and emit them just like we emit the other info like partfilename, etc. during archive
creation and store them in the index file. That way we don't have to fake everything in the
8) In listStatus, the start and end braces are missing for the if/else block
9) In listStatus, the check hstatus.isDir()?0:hstatus.getLength() seems redundant. hstatus.isDir
is always going to be false
10) I don't understand clearly why makeRelative is done in the listStatus and getFileStatus
11) Do you enforce the .har in the archive name when it is created?

I am not done reviewing the entire patch yet ..

> Archives in Hadoop.
> -------------------
>                 Key: HADOOP-3307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3307
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>             Fix For: 0.18.0
>         Attachments: hadoop-3307_1.patch, hadoop-3307_2.patch
> This is a new feature for archiving and unarchiving files in HDFS. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message