avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devajyoti Sarkar <dsar...@q-kk.com>
Subject Re: Setting bytes in Java
Date Wed, 19 Jan 2011 07:18:36 GMT
It has been filed as AVRO-738.

Thanks for the links.

Dev

On Wed, Jan 19, 2011 at 12:00 AM, Scott Carey <scott@richrelevance.com>wrote:

> Please open a bug report in JIRA.  I don't have time to look at this now,
> but someone else might.
>
>
> On the topic of per record versioning and how to design a system that does
> not store schemas per record, there have been useful topics on this
> mailing list in the past:
>
>
> http://search-hadoop.com/m/66jvQoopYw/HAvroBase&subj=Re+question+about+comp
> letely+untagged+data+
>
> http://search-hadoop.com/m/q7lLU1GVhHd2/HAvroBase&subj=Re+Versioning+of+an+
> array+of+a+record
>
> On 1/18/11 10:08 AM, "David Rosenstrauch" <darose@darose.net> wrote:
>
> >I've also found this to be the case, and was wondering about it.  I also
> >had thought that I could just re-init an existing BinaryEncoder, but
> >found that I had to create a new one each time.  I didn't really think
> >much of it at the time, but in retrospect it does sound like it might be
> >a bug.  Perhaps one of the devs can comment more.  (And/or perhaps you
> >might want to open a bug report about this.)
> >
> >DR
> >
> >On 01/18/2011 03:17 AM, Devajyoti Sarkar wrote:
> >> Let me first give some context, I would like to store a datum serialized
> >> with a BinaryEncoder without having to place a schema with it (as the
> >> DataFileWriter does). Instead I have created a container record that
> >>stores
> >> a unique id for the schema version and a payload field of type "bytes".
> >>This
> >> allows me to have a self-describing data object (for example, to place
> >>in a
> >> cell in HBase) without the overhead of a schema per object. (Perhaps
> >>there
> >> is a better way to do this, if so please let me know).
> >>
> >> The code looks something like this:
> >>
> >>      GenericRecord container = new GenericData.Record(containerSchema);
> >>      writer.setSchema(containerSchema);
> >>      container.put(CONTAINER_SCHEMA_ID_FIELD,
> >> datum.getSchema().getProp(SCHEMA_ID_PROPERTY));
> >>      container.put(CONTAINER_PAYLOAD_FIELD,
> >> ByteBuffer.wrap(datumBits.toByteArray()));
> >>      ByteArrayOutputStream containerBits = new ByteArrayOutputStream();
> >>      encoder.init(containerBits);
> >>      writer.write(container, encoder);
> >>      encoder.flush();
> >>      containerBits.flush();
> >>      containerBits.close();
> >>
> >> I am trying to reuse an encoder by calling init() to re-initialize it.
> >> Perhaps this is what creates the problem. If I create a new encoder each
> >> time everything works fine. However, if I just use init, then the
> >> OutputStream for the encoder is reset but the OutputStream for the
> >> SimpleByteWriter within the encoder is not. This seems to be causing the
> >> problem because when the encoder is flushed, it does not write the
> >>bytes in
> >> the ByteWriter. Perhaps the init() method is not supposed to be used
> >>this
> >> way. But it would be nice to not have to create a new encoder each time.
> >>
> >> Can you please let me know if the above looks right and advise me as to
> >>what
> >> is the best way to do the serialization.
> >>
> >> Thanks,
> >> Dev
> >>
> >>
> >>
> >> On Tue, Jan 18, 2011 at 4:14 AM, Scott
> >>Carey<scott@richrelevance.com>wrote:
> >>
> >>> BinaryEncoder buffers data, you may have to call flush() to see it in
> >>>the
> >>> output stream.
> >>>
> >>>
> >>> On 1/17/11 4:53 AM, "Devajyoti Sarkar"<dsarkar@q-kk.com>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am just beginning to use Avro, so I apologize if this is a silly
> >>> question.
> >>>
> >>> I would like to set a field of type "bytes" in Java. I am assuming
> >>>that all
> >>> I need to do is wrap a byte[] in a ByteBuffer to set the value.
> >>> Unfortunately that does not seem to work. I am using a BinaryEncoder
> >>>and
> >>> looking at its output, it has not written any the bytes that were in
> >>>the
> >>> array. The first four values of the array are 0, -128, -128, -128.
> >>>
> >>> Is it because Java uses 8-bit signed bytes while the Avro spec calls
> >>>for
> >>> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one
> >>>convert
> >>> Java bytes to the kind accepted by Avro?
> >>>
> >>> Thanks in advance.
> >>>
> >>> Dev
> >>>
> >>>
> >>
> >
>
>

Mime
View raw message