hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-7519) hadoop fs commands should support tar/gzip or an equivalent
Date Wed, 10 Aug 2011 01:25:27 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Koji Noguchi updated HADOOP-7519:

    Attachment: hadoop-7519-0.20.2XX-1.patch

Some of my users had this need in the past. Wrote a short wrapper to org.apache.tools.tar.
 I got the idea from reading http://stuartsierra.com/2008/04/24/a-million-little-files where
this user converted tar file into sequence file.

This is not ready to commit at all, but I think this gives an idea.  It needs ant.jar on its

% export HADOOP_CLASSPATH=./contrib/tar/lib/ant.jar
% hadoop jar contrib/tar/hadoop-tar.jar --help
usage: hadoop jar hadoop-tar.jar [options]
 -c,--create                 create a new archive
 -C,--directory <DIR>        Set the working directory to DIR
 -f,--file <FILE>            Use archive file (default '-' for
    --help                   show help message
    --overwrite              overwrite existing directory
 -P,--absolute-names         don't strip leading / from file name
 -p,--preserve-permissions   apply recorded permissions instead of
                             applying user's umask when extracting files
    --same-group             create extracted files with the same group id
    --same-owner             create extracted files with the same
 -t,--list                   list files from an archive
 -v,--verbose                print verbose output
 -x,--extract                extract files from an archive
 -z,--compress               filter the archive through
                             compress/uncompress gzip

> hadoop fs commands should support tar/gzip or an equivalent
> -----------------------------------------------------------
>                 Key: HADOOP-7519
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7519
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.1
>            Reporter: Keith Wiley
>            Priority: Minor
>              Labels: hadoop
>         Attachments: hadoop-7519-0.20.2XX-1.patch
> The "hadoop fs" subcommand should offer options for batching, unbatching, compressing,
and uncompressing files on hdfs.  The equivalent of "hadoop fs -tar" or "hadoop fs -gzip".
 These commands would greatly facilitate moving large data (especially in a large number of
files) back and forth from hdfs.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message