ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Astier <fast...@yahoo-inc.com>
Subject Question about using Tar with Hadoop files
Date Mon, 19 Dec 2011 18:25:40 GMT
Hi -

I’m trying to use the Apache Tar package (1.8.2) for a Java program that tars large files
in Hadoop. I am currently failing on a file that’s 17 GB long. Note that this code works
without any problem for smaller files. I’m tarring smaller HDFS files all day long without
any problem. It fails only when I have to tar that 17 GB file. I have a hard time making sense
of the error message, after looking at source code for 3 days now... The exact file size at
the time of the error is: 17456999265 bytes. The exception I’m seeing is:

12/19/11 5:54 PM [BDM.main] EXCEPTION request to write '65535' bytes exceeds size in header
of '277130081' bytes
12/19/11 5:54 PM [BDM.main] EXCEPTION org.apache.tools.tar.TarOutputStream.write(TarOutputStream.java:238)
12/19/11 5:54 PM [BDM.main] EXCEPTION com.yahoo.ads.ngdstone.tpbdm.HDFSTar.archive(HDFSTar.java:149)

My code is:

           TarEntry entry = new TarEntry(p.getName());
           Path absolutePath = p.isAbsolute() ? p : new Path(baseDir, p); // HDFS Path
           FileStatus fileStatus = fs.getFileStatus(absolutePath); // HDFS fileStatus
           entry.setNames(fileStatus.getOwner(), fileStatus.getGroup());
           entry.setUserName(user);
           entry.setGroupName(group);
            entry.setName(name);
            entry.setSize(fileStatus.getLen());
            entry.setMode(Integer.parseInt("0100" + permissions, 8));
            out.putNextEntry(entry); // out = TarOutputStream

            if (fileStatus.getLen() > 0) {

                InputStream in = fs.open(absolutePath); // large file in HDFS

                try {

                    ++nEntries;

                    int bytesRead = in.read(buf);

                    while (bytesRead >= 0) {
                        out.write(buf, 0, bytesRead);
                        bytesRead = in.read(buf);
                    }

                } finally {
                    in.close();
                }
            }

            out.closeEntry();

Any idea? Am I missing anything in the way I’m setting up the TarOutputStream or TarEntry?
Or does tar have implicit limits that are never going to work for multi-gigabytes size files?

Thanks!

Frank

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message