avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: [DISCUSS][JAVA] Generating toBytes/fromBytes methods?
Date Wed, 23 Dec 2015 00:41:54 GMT
Including a schema fingerprint at the start

1) reuses stuff we have
2) gives a language independent notion of compatibility
3) doesn't bind how folks get stuff in/out of the single record form.

-- 
Sean Busbey
On Dec 22, 2015 06:52, "Niels Basjes" <Niels@basjes.nl> wrote:

> I was not clear enough in my previous email.
> What I meant is to 'wrap' the application schema in a serialization wrapper
> schema that has a field indicating the "schema classname".
> That (generic setup) combined with some generated code in the schema
> classes should yield a solution that supports schema migration.
>
> Niels
>
> On Tue, Dec 22, 2015 at 11:55 AM, Niels Basjes <Niels@basjes.nl> wrote:
>
> > Thanks for pointing this out.
> > This is exactly what I was working on.
> >
> > The way I solved the 'does the schema match' question at work is by
> > requiring that all schema's start with a single text field "schema
> > classname" being the full class name of the class that was used to
> generate
> > it.
> > That way we can have newer versions of the schema and still be able to
> > unpack them. In this form the classname is essentially an indicator if
> > schema migration is possible; even though the schemas are different.
> >
> > What do you think of this direction?
> >
> > Niels
> >
> >
> > On Mon, Dec 21, 2015 at 11:30 PM, Ryan Blue <blue@cloudera.com> wrote:
> >
> >> Niels,
> >>
> >> This sounds like a good idea to me to have methods like this. I've had
> to
> >> write those methods several times!
> >>
> >> The idea is also related to AVRO-1704 [1], which is a suggestion to
> >> standardize the encoding that is used for single records. Some projects
> >> have been embedding the schema fingerprint at the start of each record,
> for
> >> example, which would be a helpful thing to do.
> >>
> >> It may also be a good idea to create a helper object rather than
> >> attaching new methods to the datum classes themselves. In your example
> >> below, you have to create a new encoder or decoder for each method
> call. We
> >> could instead keep a backing buffer and encoder/decoder on a class that
> the
> >> caller instantiates so that they can be reused. At the same time, that
> >> would make it possible to reuse the class with any data model and manage
> >> the available schemas (if embedding the fingerprint).
> >>
> >> I'm thinking something like this:
> >>
> >>   ReflectClass datum = new ReflectClass();
> >>   ReflectData model = ReflectData.get();
> >>   DatumCodec codec = new DatumCodec(model, schema);
> >>
> >>   # convert datum to bytes using data model
> >>   byte[] asBytes = codec.toBytes(datum);
> >>
> >>   # convert bytes to datum using data model
> >>   ReflectClass copy = codec.fromBytes(asBytes);
> >>
> >> What do you think?
> >>
> >> rb
> >>
> >>
> >> [1]: https://issues.apache.org/jira/browse/AVRO-1704
> >>
> >>
> >> On 12/18/2015 05:01 AM, Niels Basjes wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm working on a project where I'm putting Avro records into Kafka and
> at
> >>> the other end pull them out again.
> >>> For that purpose I wrote two methods 'toBytes' and 'fromBytes' in a
> >>> separate class (see below).
> >>>
> >>> I see this as the type of problem many developers run into.
> >>> Would it be a good idea to generate methods like these into the
> generated
> >>> Java code?
> >>>
> >>> This would make it possible to serialize and deserialize singles
> records
> >>> like this:
> >>>
> >>> byte [] someBytes = measurement.toBytes();
> >>> Measurement m = Measurement.fromBytes(someBytes);
> >>>
> >>> Niels Basjes
> >>>
> >>> P.S. possibly not name it toBytes but getBytes (similar to what the
> >>> String
> >>> class has)
> >>>
> >>> public final class MeasurementSerializer {
> >>>      private MeasurementSerializer() {
> >>>      }
> >>>
> >>>      public static Measurement fromBytes(byte[] bytes) throws
> >>> IOException {
> >>>          try {
> >>>              DatumReader<Measurement> reader = new
> >>> SpecificDatumReader<>(Measurement.getClassSchema());
> >>>              Decoder decoder =
> DecoderFactory.get().binaryDecoder(bytes,
> >>> null);
> >>>              return reader.read(null, decoder);
> >>>          } catch (RuntimeException rex) {
> >>>              throw new IOException(rex.getMessage());
> >>>          }
> >>>      }
> >>>
> >>>      public static byte[] toBytes(Measurement measurement) throws
> >>> IOException {
> >>>          try {
> >>>              ByteArrayOutputStream out = new ByteArrayOutputStream();
> >>>              Encoder encoder = EncoderFactory.get().binaryEncoder(out,
> >>> null);
> >>>              SpecificDatumWriter<Measurement> writer = new
> >>> SpecificDatumWriter<>(Measurement.class);
> >>>              writer.write(measurement, encoder);
> >>>              encoder.flush();
> >>>              out.close();
> >>>              return out.toByteArray();
> >>>          } catch (RuntimeException rex) {
> >>>              throw new IOException(rex.getMessage());
> >>>          }
> >>>      }
> >>> }
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> Ryan Blue
> >> Software Engineer
> >> Cloudera, Inc.
> >>
> >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
> >
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message