ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Stevens <insomniacpeng...@googlemail.com>
Subject Re: Question about using Tar with Hadoop files
Date Tue, 20 Dec 2011 09:03:35 GMT
You didn't mention what version of Ant was involved...

Andy.
On 20 Dec 2011 05:31, "Frank Astier" <fastier@yahoo-inc.com> wrote:

> Hi -
>
> I’m trying to use the Apache Tar package (1.8.2) for a Java program that
> tars large files in Hadoop. I am currently failing on a file that’s 17 GB
> long. Note that this code works without any problem for smaller files. I’m
> tarring smaller HDFS files all day long without any problem. It fails only
> when I have to tar that 17 GB file. I have a hard time making sense of the
> error message, after looking at source code for 3 days now... The exact
> file size at the time of the error is: 17456999265 bytes. The exception I’m
> seeing is:
>
> 12/19/11 5:54 PM [BDM.main] EXCEPTION request to write '65535' bytes
> exceeds size in header of '277130081' bytes
> 12/19/11 5:54 PM [BDM.main] EXCEPTION
> org.apache.tools.tar.TarOutputStream.write(TarOutputStream.java:238)
> 12/19/11 5:54 PM [BDM.main] EXCEPTION
> com.yahoo.ads.ngdstone.tpbdm.HDFSTar.archive(HDFSTar.java:149)
>
> My code is:
>
>           TarEntry entry = new TarEntry(p.getName());
>           Path absolutePath = p.isAbsolute() ? p : new Path(baseDir, p);
> // HDFS Path
>           FileStatus fileStatus = fs.getFileStatus(absolutePath); // HDFS
> fileStatus
>           entry.setNames(fileStatus.getOwner(), fileStatus.getGroup());
>           entry.setUserName(user);
>           entry.setGroupName(group);
>            entry.setName(name);
>            entry.setSize(fileStatus.getLen());
>            entry.setMode(Integer.parseInt("0100" + permissions, 8));
>            out.putNextEntry(entry); // out = TarOutputStream
>
>            if (fileStatus.getLen() > 0) {
>
>                InputStream in = fs.open(absolutePath); // large file in
> HDFS
>
>                try {
>
>                    ++nEntries;
>
>                    int bytesRead = in.read(buf);
>
>                    while (bytesRead >= 0) {
>                        out.write(buf, 0, bytesRead);
>                        bytesRead = in.read(buf);
>                    }
>
>                } finally {
>                    in.close();
>                }
>            }
>
>            out.closeEntry();
>
> Any idea? Am I missing anything in the way I’m setting up the
> TarOutputStream or TarEntry? Or does tar have implicit limits that are
> never going to work for multi-gigabytes size files?
>
> Thanks!
>
> Frank
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message