avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rosenstrauch <dar...@darose.net>
Subject Re: Setting bytes in Java
Date Tue, 18 Jan 2011 18:08:57 GMT
I've also found this to be the case, and was wondering about it.  I also 
had thought that I could just re-init an existing BinaryEncoder, but 
found that I had to create a new one each time.  I didn't really think 
much of it at the time, but in retrospect it does sound like it might be 
a bug.  Perhaps one of the devs can comment more.  (And/or perhaps you 
might want to open a bug report about this.)


On 01/18/2011 03:17 AM, Devajyoti Sarkar wrote:
> Let me first give some context, I would like to store a datum serialized
> with a BinaryEncoder without having to place a schema with it (as the
> DataFileWriter does). Instead I have created a container record that stores
> a unique id for the schema version and a payload field of type "bytes". This
> allows me to have a self-describing data object (for example, to place in a
> cell in HBase) without the overhead of a schema per object. (Perhaps there
> is a better way to do this, if so please let me know).
> The code looks something like this:
>      GenericRecord container = new GenericData.Record(containerSchema);
>      writer.setSchema(containerSchema);
>      container.put(CONTAINER_SCHEMA_ID_FIELD,
> datum.getSchema().getProp(SCHEMA_ID_PROPERTY));
>      container.put(CONTAINER_PAYLOAD_FIELD,
> ByteBuffer.wrap(datumBits.toByteArray()));
>      ByteArrayOutputStream containerBits = new ByteArrayOutputStream();
>      encoder.init(containerBits);
>      writer.write(container, encoder);
>      encoder.flush();
>      containerBits.flush();
>      containerBits.close();
> I am trying to reuse an encoder by calling init() to re-initialize it.
> Perhaps this is what creates the problem. If I create a new encoder each
> time everything works fine. However, if I just use init, then the
> OutputStream for the encoder is reset but the OutputStream for the
> SimpleByteWriter within the encoder is not. This seems to be causing the
> problem because when the encoder is flushed, it does not write the bytes in
> the ByteWriter. Perhaps the init() method is not supposed to be used this
> way. But it would be nice to not have to create a new encoder each time.
> Can you please let me know if the above looks right and advise me as to what
> is the best way to do the serialization.
> Thanks,
> Dev
> On Tue, Jan 18, 2011 at 4:14 AM, Scott Carey<scott@richrelevance.com>wrote:
>> BinaryEncoder buffers data, you may have to call flush() to see it in the
>> output stream.
>> On 1/17/11 4:53 AM, "Devajyoti Sarkar"<dsarkar@q-kk.com>  wrote:
>> Hi,
>> I am just beginning to use Avro, so I apologize if this is a silly
>> question.
>> I would like to set a field of type "bytes" in Java. I am assuming that all
>> I need to do is wrap a byte[] in a ByteBuffer to set the value.
>> Unfortunately that does not seem to work. I am using a BinaryEncoder and
>> looking at its output, it has not written any the bytes that were in the
>> array. The first four values of the array are 0, -128, -128, -128.
>> Is it because Java uses 8-bit signed bytes while the Avro spec calls for
>> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one convert
>> Java bytes to the kind accepted by Avro?
>> Thanks in advance.
>> Dev

View raw message