hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Jirapinyo <kjirapi...@biz360.com>
Subject Extra 4 bytes at beginning of serialized file
Date Wed, 12 Aug 2009 01:33:32 GMT
Hi all,
   I was wondering if anyone's encountered 4 extra bytes at the beginning of
the serialized object file using MultipleOutputFormat.  Basically, I am
using BytesWritable to write the serialized byte arrays in the reducer
phase.  My writer is a generic one:

public class GenericOutputFormat extends FileOutputFormat<Writable,
Writable>  {

    @Override
    public RecordWriter<Writable, Writable> getRecordWriter(FileSystem
ignored, JobConf job, String name, Progressable progress)
        throws IOException {
          Path file = FileOutputFormat.getTaskOutputPath(job, name);
          FileSystem fs = file.getFileSystem(job);
          FSDataOutputStream fileOut = fs.create(file, progress);
        return new GenericWriter(fileOut);
    }

    static class GenericWriter implements RecordWriter<Writable, Writable> {
        protected DataOutputStream out;

        GenericWriter(DataOutputStream out) {
            this.out = out;
        }

        @Override
        public synchronized void close(Reporter reporter) throws IOException
{
            out.close();
        }

        @Override
        public synchronized void write(Writable key, Writable value) throws
IOException {
            key.write(out);
        }
    }
}

Basically, it'll just write out whatever is in the DataOutputStream.  When i
debugged, I printed out the size of the byte array in the BytesWritable, and
the resulting file is always 4 bytes larger than that number.  Any ideas?

-- Kris.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message