hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Jirapinyo <kris.jirapi...@biz360.com>
Subject Re: Extra 4 bytes at beginning of serialized file
Date Wed, 12 Aug 2009 02:23:41 GMT
Ah that explains it, thanks Todd.  Is there a way to serialize an object
without using BytesWritable, or some way I can have a "perfect" serialized
file so I won't have to keep discarding the first 4 bytes of the files?

-- Kris.

On Tue, Aug 11, 2009 at 7:03 PM, Todd Lipcon <todd@cloudera.com> wrote:

> BytesWritable serializes itself by first outputting the array length, and
> then outputting the array itself. The 4 bytes at the top of the file are
> the
> length of the value itself.
>
> Hope that helps
> -Todd
>
> On Tue, Aug 11, 2009 at 6:33 PM, Kris Jirapinyo <kjirapinyo@biz360.com
> >wrote:
>
> > Hi all,
> >   I was wondering if anyone's encountered 4 extra bytes at the beginning
> of
> > the serialized object file using MultipleOutputFormat.  Basically, I am
> > using BytesWritable to write the serialized byte arrays in the reducer
> > phase.  My writer is a generic one:
> >
> > public class GenericOutputFormat extends FileOutputFormat<Writable,
> > Writable>  {
> >
> >    @Override
> >    public RecordWriter<Writable, Writable> getRecordWriter(FileSystem
> > ignored, JobConf job, String name, Progressable progress)
> >        throws IOException {
> >          Path file = FileOutputFormat.getTaskOutputPath(job, name);
> >          FileSystem fs = file.getFileSystem(job);
> >          FSDataOutputStream fileOut = fs.create(file, progress);
> >        return new GenericWriter(fileOut);
> >    }
> >
> >    static class GenericWriter implements RecordWriter<Writable, Writable>
> {
> >        protected DataOutputStream out;
> >
> >        GenericWriter(DataOutputStream out) {
> >            this.out = out;
> >        }
> >
> >        @Override
> >        public synchronized void close(Reporter reporter) throws
> IOException
> > {
> >            out.close();
> >        }
> >
> >        @Override
> >        public synchronized void write(Writable key, Writable value)
> throws
> > IOException {
> >            key.write(out);
> >        }
> >    }
> > }
> >
> > Basically, it'll just write out whatever is in the DataOutputStream.
>  When
> > i
> > debugged, I printed out the size of the byte array in the BytesWritable,
> > and
> > the resulting file is always 4 bytes larger than that number.  Any ideas?
> >
> > -- Kris.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message