avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <sg.online.em...@gmail.com>
Subject Re: Need review/merges for couple of pull requests
Date Thu, 02 Jun 2016 00:36:33 GMT
Thanks RB,

But I am not sure I follow that example (Probably because there is no
actual datum class there?).

Consider a complex code running in production with lots of circular
references.
Something like:

class Home {
   String address;
   Integer zipCode;
   List <Person> residents;
}

class Person {
    String name;
    Person spouse;
    House home;
    Map <Car, Dealer> vehiclesOwned;
}

class Car {
     String make;
     Integer year;
     Dealer soldBy;
     Person ownedBy;
}

class Dealer {
    String name;
    String address;
    List<Car> vehiclesSold;
}

The above is a made-up example, intentionally using easy-to-understand
references.
But it does show that circular references can become very complex and
across multiple chains.

If users have to write Conversion classes for all of the above along with
hash-map(s) that finds out IDs for the objects seen already,  writes them
out as separate fields and then reads them back to replace them with actual
objects, then it would be expecting too much from our clients IMO.

Perhaps it would be more helpful if there was a more seamless way of
getting this done automatically.

SG



On Wed, Jun 1, 2016 at 5:22 PM, Ryan Blue <rblue@netflix.com.invalid> wrote:

> Those aren't the datum classes, those are logical types that are added to
> the schema for your datum classes. The "referenceable" logical type is
> applied to the class that gets replaced with an ID reference and points to
> the field to use for that ID. The "reference" logical type is applied to
> the class that contains a referencable datum. Then there are conversions
> that replace a referenceable with its ID.
>
> On Wed, Jun 1, 2016 at 4:52 PM, S G <sg.online.email@gmail.com> wrote:
>
> > RB,
> >
> > The test TestCircularReferences shows that the classes having circular
> > reference need to extend LogicalType.
> > If every circular reference class has to do this, then I think this would
> > be a big limitation for actual classes used in production code because
> they
> > would be already extending other classes plus asking several other teams
> to
> > change their base-class would be difficult.
> > Is there a way to do this without making the classes extend the
> LogicalType
> > ?
> >
> > SG
> >
> >
> > On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rblue@netflix.com.invalid>
> > wrote:
> >
> > > SG,
> > >
> > > The example/test I built uses logical types to remove the circular
> > > reference when writing and restore it when reading. It doesn't look
> like
> > > your test adds logical types, so that's probably the issue.
> > >
> > > rb
> > >
> > > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg.online.email@gmail.com> wrote:
> > >
> > > > Hey RB,
> > > >
> > > > I was trying out the circular refs with latest 1.8.1 version of Avro
> > and
> > > it
> > > > doesn't seem to be working out of the box for me.
> > > > Perhaps I am missing something and would appreciate your help.
> > > >
> > > > Thanks,
> > > > SG
> > > >
> > > >
> > > > Here is my test code:
> > > >
> > > > public class CircularRefsTest
> > > > {
> > > >     @Test
> > > >     public void testSerialization() throws Exception
> > > >     {
> > > >         // Create a circular linked-list
> > > >         ListNode first = new ListNode("first");
> > > >         ListNode second = new ListNode("second");
> > > >         second.next = first;
> > > >         first.next = second;
> > > >
> > > >         ReflectData rdata = ReflectData.AllowNull.get();
> > > >
> > > >         // Print Schema
> > > >         Schema schema = rdata.getSchema(ListNode.class);
> > > >         System.out.println(schema);
> > > >
> > > >         // Serialize
> > > >         DatumWriter<ListNode> datumWriter = new
> > > > ReflectDatumWriter<ListNode>(ListNode.class);
> > > >         ByteArrayOutputStream byteArrayOutputStream = new
> > > > ByteArrayOutputStream();
> > > >         Encoder encoder =
> > > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
> > > >         datumWriter.write(first, encoder);
> > > >         encoder.flush();
> > > >         byte[] bytes = byteArrayOutputStream.toByteArray();
> > > >
> > > >         assertTrue( bytes != null );
> > > >     }
> > > > }
> > > >
> > > > class ListNode
> > > > {
> > > >     String data;
> > > >     ListNode next;
> > > >
> > > >     public ListNode() {
> > > >     }
> > > >     public ListNode(String data) {
> > > >         this.data = data;
> > > >     }
> > > > }
> > > >
> > > >
> > > >
> > > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <blue@cloudera.com>
> wrote:
> > > >
> > > > > SG,
> > > > >
> > > > > The data ends up looking like this:
> > > > >
> > > > > {"id":1,"p":"parent data!","child":{"c":"child
> > > > data!","parent":{"long":1}}}
> > > > >
> > > > > I produced that with avro-tools 1.7.6 and tojson.
> > > > >
> > > > > Here's the schema: {
> > > > >   "type" : "record",
> > > > >   "name" : "Parent",
> > > > >   "fields" : [ {
> > > > >     "name" : "id",
> > > > >     "type" : "long"
> > > > >   }, {
> > > > >     "name" : "p",
> > > > >     "type" : "string"
> > > > >   }, {
> > > > >     "name" : "child",
> > > > >     "type" : {
> > > > >       "type" : "record",
> > > > >       "name" : "Child",
> > > > >       "fields" : [ {
> > > > >         "name" : "c",
> > > > >         "type" : "string"
> > > > >       }, {
> > > > >         "name" : "parent",
> > > > >         "type" : [ "null", "long", "Parent" ]
> > > > >       } ],
> > > > >       "logicalType" : "reference",
> > > > >       "ref-field-name" : "parent"
> > > > >     }
> > > > >   } ],
> > > > >   "logicalType" : "referenceable",
> > > > >   "id-field-name" : "id"
> > > > > }
> > > > >
> > > > > rb
> > > > >
> > > > >
> > > > > On 05/28/2015 09:50 AM, S G wrote:
> > > > >
> > > > >> RB,
> > > > >>
> > > > >> Could you please attach the schema and the JSON serialized output
> > from
> > > > >> your
> > > > >> test-code as well?
> > > > >> My build environment is currently broken as I am grappling with
> some
> > > > Java
> > > > >> 8
> > > > >> update issues.
> > > > >>
> > > > >> Thanks
> > > > >> SG
> > > > >>
> > > > >>
> > > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <blue@cloudera.com>
> > wrote:
> > > > >>
> > > > >> SG,
> > > > >>>
> > > > >>> Now that logical types are in, I had some time to look at
this
> > issue.
> > > > >>> Thanks for your patience on this.
> > > > >>>
> > > > >>> When I started looking at the use case, this began to look
very
> > much
> > > > like
> > > > >>> a logical type issue. (I know, I've been saying that a lot.)
When
> > you
> > > > >>> write, you replace any referenced object with its id. When
you
> > read,
> > > > you
> > > > >>> replace those ids with the correct object. I went ahead and
> > > implemented
> > > > >>> this using 2 logical types: Referenceable for an object with
an
> id,
> > > and
> > > > >>> Reference for a record with a field that references another
> object
> > by
> > > > id.
> > > > >>>
> > > > >>> There were some trade-offs to this approach. For example,
I had
> to
> > > use
> > > > a
> > > > >>> logical type for the object containing the reference rather
than
> > for
> > > > the
> > > > >>> reference itself because I used a callback to set the object
once
> > it
> > > is
> > > > >>> found. That happens because children with references to a
parent
> > are
> > > > >>> deserialized completely first. The parent is the last object
to
> be
> > > > >>> assembled and passed to the logical type conversion.
> > > > >>>
> > > > >>> A bigger issue was that logical types are currently conservative
> > and
> > > > >>> don't
> > > > >>> overlap with reflect types or anything that sets java-class.
That
> > > means
> > > > >>> that this currently only works with generic types. But, I'd
> rather
> > > make
> > > > >>> logical types work for reflect than add more custom code
to
> support
> > > > this.
> > > > >>> Does that sound reasonable?
> > > > >>>
> > > > >>> I'm attaching a diff with the working test code so you can
take a
> > > look
> > > > at
> > > > >>> the approach. Let me know what you are thinking.
> > > > >>>
> > > > >>> rb
> > > > >>>
> > > > >>> On 05/20/2015 12:05 PM, S G wrote:
> > > > >>>
> > > > >>> I am requesting some help with AVRO-695.
> > > > >>>> Here are some pieces from the last conversation.
> > > > >>>>
> > > > >>>>
> > > > >>>> Doug Cutting
> > > > >>>> <
> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
> > > > >>>> added
> > > > >>>> a comment - 02/Oct/14 21:19
> > > > >>>>
> > > > >>>> Here's a modified version of the patch. It moves all
significant
> > > > changes
> > > > >>>> to
> > > > >>>> reflect. Since reflect is the only implementation that
can
> > generate
> > > an
> > > > >>>> appropriate schema, changes should be confined to there.
> > > > >>>>
> > > > >>>> The tests need to be updated, as they still assume generic.
> > > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
> > > > >>>> <
> > > > >>>>
> > > > >>>>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
> > > > >>>>
> > > > >>>>>
> > > > >>>>> Sachin Goyal
> > > > >>>> <
> > > > >>>>
> > > >
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
> > > > >>>> >
> > > > >>>> added
> > > > >>>> a comment - 21/Jan/15 21:56
> > > > >>>>
> > > > >>>> Here is a patch with the updated test-cases.
> > > > >>>> I also confirm that all my changes are there in the patch.
> > > > >>>>
> > > > >>>>
> > > > >>>> The patch was submitted in June 2014 and was very hot
till
> October
> > > > 2014.
> > > > >>>> Since then, there has been no action on this even though
I have
> > sent
> > > > >>>> many
> > > > >>>> reminders in this group.
> > > > >>>>
> > > > >>>> I understand that everyone is very busy with their own
stuff
> but I
> > > > would
> > > > >>>> really appreciate if someone could help a fellow engineer
in
> > getting
> > > > his
> > > > >>>> patch accepted.
> > > > >>>> It would encourage more participation as well as help
people
> > wanting
> > > > to
> > > > >>>> use
> > > > >>>> Avro for circular references.
> > > > >>>>
> > > > >>>> Regards
> > > > >>>> SG
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>> --
> > > > >>> Ryan Blue
> > > > >>> Software Engineer
> > > > >>> Cloudera, Inc.
> > > > >>>
> > > > >>>
> > > > >>
> > > > >
> > > > > --
> > > > > Ryan Blue
> > > > > Software Engineer
> > > > > Cloudera, Inc.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message