hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rodrigo Schmidt (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1548) Hadoop archives should be able to preserve times and other properties from original files
Date Wed, 03 Mar 2010 23:36:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840948#action_12840948

Rodrigo Schmidt commented on MAPREDUCE-1548:

Nicholas, yes I would like to store this information and not apply it to the part files (as
you pointed out, this would probably lead to inconsistencies). However, I would like the correct
ownership, permission, and times to show up when someone does an ls on a har:// path. Currently,
the code creates FileStatus objects taking part of the information (e.g., size) from inside
index file and the rest  (replication, owner, times) from the properties of the index file
itself, which is not correct. Currently, if you do an ls on a har:// path, the contents will
show up as having a replication factor of 10 (the default for the index file) although the
part file containing the data will probably have replication factor of 3 (hdfs default).

Keeping the properties does not really prevent someone who has access to the har directory
and files to acess the part file that contains the data, but it would help a lot if we wanted
to unhar the files at some point and keep their original properties. That's exactly what I'm
proposing in this JIRA.

Shortly, I would like to store the properties in the index file, list them on an ls command,
and return them correctly on getFileStatus(), not much more than that. I think this would
be a good start for future and more complicated extensions.

> Hadoop archives should be able to preserve times and other properties from original files
> -----------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1548
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1548
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: harchive
>            Reporter: Rodrigo Schmidt
>            Assignee: Rodrigo Schmidt
> Files inside hadoop archives don't keep their original:
> - modification time
> - access time
> - permission
> - owner
> - group
> all such properties are currently taken from the file storing the archive index, and
not the stored files. This doesn't look very correct.
> There should be possible to preserve the original properties of the stored files.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message