hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Extra 4 bytes at beginning of serialized file
Date Wed, 12 Aug 2009 02:03:39 GMT
BytesWritable serializes itself by first outputting the array length, and
then outputting the array itself. The 4 bytes at the top of the file are the
length of the value itself.

Hope that helps
-Todd

On Tue, Aug 11, 2009 at 6:33 PM, Kris Jirapinyo <kjirapinyo@biz360.com>wrote:

> Hi all,
>   I was wondering if anyone's encountered 4 extra bytes at the beginning of
> the serialized object file using MultipleOutputFormat.  Basically, I am
> using BytesWritable to write the serialized byte arrays in the reducer
> phase.  My writer is a generic one:
>
> public class GenericOutputFormat extends FileOutputFormat<Writable,
> Writable>  {
>
>    @Override
>    public RecordWriter<Writable, Writable> getRecordWriter(FileSystem
> ignored, JobConf job, String name, Progressable progress)
>        throws IOException {
>          Path file = FileOutputFormat.getTaskOutputPath(job, name);
>          FileSystem fs = file.getFileSystem(job);
>          FSDataOutputStream fileOut = fs.create(file, progress);
>        return new GenericWriter(fileOut);
>    }
>
>    static class GenericWriter implements RecordWriter<Writable, Writable> {
>        protected DataOutputStream out;
>
>        GenericWriter(DataOutputStream out) {
>            this.out = out;
>        }
>
>        @Override
>        public synchronized void close(Reporter reporter) throws IOException
> {
>            out.close();
>        }
>
>        @Override
>        public synchronized void write(Writable key, Writable value) throws
> IOException {
>            key.write(out);
>        }
>    }
> }
>
> Basically, it'll just write out whatever is in the DataOutputStream.  When
> i
> debugged, I printed out the size of the byte array in the BytesWritable,
> and
> the resulting file is always 4 bytes larger than that number.  Any ideas?
>
> -- Kris.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message