hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6097) Multiple bugs w/ Hadoop archives
Date Thu, 20 Aug 2009 16:26:14 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745477#action_12745477

Koji Noguchi commented on HADOOP-6097:

Ben, sorry about that. I was wrong.

I got confused since HarFileSystem.getUri() returns har://archivepath, 
I mistakenly thought FileSystem.CACHE would use the path as part of the hash key.

$ hadoop dfs -ls har:///user/knoguchi/test.har har:///user/knoguchi/test2.har            
Found 1 items
drw-r--r--   - knoguchi users          0 2009-08-18 18:52 /user/knoguchi/test.har/user
ls: Invalid file name: /user/knoguchi/test2.har in har:///user/knoguchi/test.har

$ hadoop dfs -ls har:///user/knoguchi/test2.har har:///user/knoguchi/test.har
Found 1 items
drw-------   - knoguchi users          0 2009-08-17 19:15 /user/knoguchi/test2.har/user
ls: Invalid file name: /user/knoguchi/test.har in har:///user/knoguchi/test2.har

> Multiple bugs w/ Hadoop archives
> --------------------------------
>                 Key: HADOOP-6097
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6097
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>            Reporter: Ben Slusky
>             Fix For: 0.20.1
>         Attachments: HADOOP-6097.patch
> Found and fixed several bugs involving Hadoop archives:
> - In makeQualified(), the sloppy conversion from Path to URI and back mangles the path
if it contains an escape-worthy character.
> - It's possible that fileStatusInIndex() may have to read more than one segment of the
index. The LineReader and count of bytes read need to be reset for each block.
> - har:// connections cannot be indexed by (scheme, authority, username) -- the path is
significant as well. Caching them in this way limits a hadoop client to opening one archive
per filesystem. It seems to be safe not to cache them, since they wrap another connection
that does the actual networking.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message