avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wai Yip Tung ...@tungwaiyip.info>
Subject Re: Union resolution in dynamic languages
Date Thu, 05 Jun 2014 17:40:09 GMT
That's good to know. Would you mind sharing your use case with us?

Wai Yip

> Grant Overby (groverby) <mailto:groverby@cisco.com>
> Thursday, June 05, 2014 6:46 AM
> Disallowing multiple named types within a union would break our use cases.
>
> We have a similar problem. With two record types in a union, the 
> Python driver doesn’t choose well.
>
> We solved this problem by adding a pseudo-reserved key to the dict to 
> indicate which named type to use. I started the process of open 
> sourcing that patch a few days ago. It’s definitely a hack, but I’m 
> hoping the community will accept it.
>
> Our patch doesn’t change the time complexity. From a brief glance , 
> choosing within the union seems to typically be O(n) as the recursion 
> short circuits. For named types, the complexity could be O(1). 
> Achieving O(1) for non named types seems achievable too. How many 
> projects are impacted by this ‘wasted’ complexity? Simpler code might 
> be better than faster code.
>
> *Grant Overby*
> Software Engineer
> Cisco.com
> groverby@cisco.com <mailto:groverby@cisco.com>
> Mobile: *865 724 4910*
>
> 	
>
> **
>
>
> Think before you print.
>
> This email may contain confidential and privileged material for the 
> sole use of the intended recipient. Any review, use, distribution or 
> disclosure by others is strictly prohibited. If you are not the 
> intended recipient (or authorized to receive for the recipient), 
> please contact the sender by reply email and delete all copies of this 
> message.
>
> Please click here 
> <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> 
> for Company Registration Information.
>
>
>
> From: Wai Yip Tung <wy@tungwaiyip.info <mailto:wy@tungwaiyip.info>>
> Reply-To: "user@avro.apache.org <mailto:user@avro.apache.org>" 
> <user@avro.apache.org <mailto:user@avro.apache.org>>
> Date: Wednesday, June 4, 2014 at 9:34 PM
> To: "user@avro.apache.org <mailto:user@avro.apache.org>" 
> <user@avro.apache.org <mailto:user@avro.apache.org>>
> Subject: Re: Union resolution in dynamic languages
>
> Also I ask about this in the context of building an optimized encoder. 
> For this implementation, the resolution will be much simpler if we 
> limit union to not support two records, similar to the spec do not 
> allow two array or two map types. I wonder if this limit breaks any 
> significant use case.
>
> Wai Yip
> Wai Yip Tung <mailto:wy@tungwaiyip.info>
> Wednesday, June 04, 2014 6:34 PM
> Also I ask about this in the context of building an optimized encoder. 
> For this implementation, the resolution will be much simpler if we 
> limit union to not support two records, similar to the spec do not 
> allow two array or two map types. I wonder if this limit breaks any 
> significant use case.
>
> Wai Yip
> Wai Yip Tung <mailto:wy@tungwaiyip.info>
> Wednesday, June 04, 2014 4:40 PM
> For encoding data of union type, the Avro specification do not say a 
> lot which one of the type in the union is used. So far I am mostly 
> using union so that I can write null or another simple type. In these 
> cases, it is fairly obvious for the encoding to distinguish null from 
> other types.
>
> However a union can also be any named types. So they can be two 
> records. Let say a Manger record and a NonManager record. I think with 
> strongly typed languages, the suitable type in the union can be 
> selected by introspection. But for dynamic languages, these might just 
> be a represented as maps without any notion of type. In some case, we 
> may find that the object has all the attributes of a NonManager but 
> not the Manager. So we can conclude NonManager is the proper schema to 
> use. But this can get complicated with nested data structure where the 
> attribute that can disambiguate thing appear in a deeper level. Or you 
> can think of valid scenario where inspecting the content of the obj 
> cannot unambiguously resolve the union branch.
>
> I notice that the Python implementation use two pass recursive 
> validation possible for the reason of for resolving the union choice.
>
> I am wonder if there are much consideration about are potentially 
> complex, indirectly nested union types that might be difficult to 
> resolve? Thus adding complexity to the implementation of the encoders? 
> Are there use case in practice that involve complex union decision?
>
> Wai Yip
>

Mime
View raw message