hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Harrington <...@yahoo-inc.com>
Subject Re: Using the DataOutput/InputBuffer classes
Date Sat, 28 Jul 2007 03:29:10 GMT
 From the apidocs for DataOutputBuffer: "Returns the current contents of 
the buffer. Data is only valid to |getLength()| 
<http://lucene.apache.org/hadoop/api/org/apache/hadoop/io/DataOutputBuffer.html#getLength%28%29>."

Try:

raf.write(buffer.getData(), 0, buffer.getLength());

Brian


Phantom wrote:
> Hi All
>
> I have been trying to use the DataOutputBuffer class for its obvious memory
> efficiency. I basically write some data into the buffer and then write the
> buffer into a file (an instance of RandomAccessFile) by invoking
> buffer.getData(). However what I am seeing is that a lot of garbage is being
> written into the file which manifests itself as a series of '@' characters
> in Linux and spaces on Windows.
>
> This is my usage :
>
> DataOutputBuffer buffer = new DataOutputBuffer();
> RandomAccessFile raf  = new RandomAccessFile(file, "rw");
>
> for ( each data in some data structure )
> {
>     buffer.reset();
>     serialize data into buffer;
>     raf.write(buffer.getData());
> }
>
> When I use ByteArrayOutputStream and a DataOutputStream to do the same task
> the size of the generated file is 29K. However when I use the
> DataOutputBuffer the size of the file for the same dataset it 507K. Is my
> usage correct ?
>
> Please advice
>
> THanks
> A
>
>   


Mime
View raw message