hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: InputFormat for tarball
Date Tue, 19 Feb 2008 20:31:35 GMT
Goel, Ankur wrote:
> Hi All,
>            Is there an input format available for reading from tarballs
> (.tar.gz files) ?

Not at present.  There is support for reading .gz files, but not .tar 
files.  A problem is that that there's no way to read a chunk of such 
archives without reading everything preceding that chunk.  So, if such 
an InputFormat were written, it would be unable to efficiently split the 
processing of an archive among map tasks.


View raw message