avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvalluvan M. G. (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-248) make unions a named type
Date Wed, 02 Dec 2009 03:16:21 GMT

    [ https://issues.apache.org/jira/browse/AVRO-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784604#action_12784604

Thiruvalluvan M. G. commented on AVRO-248:

Talking about names, the current specification that records, enums and fixed (and now unions)
are named seems somewhat arbitrary. Names serve two main purposes:
   - Named entities can be reused elsewhere in the schema
   - Names are used to differentiate branches in unions

Strictly speaking names are not required if things if these situations do not occur.

The third use of name is in code generation. If we can somehow handle the code generation
part, I'd propose that we make names completely optional.

Also, one should be able to name the other non-primitive types - arrays and maps. The names
for arrays and maps are of not much use for reuse, but very useful in unions. Today, one cannot
have a union of int arrays and string arrays. One could argue that the same effect can be
achieved by having an array of unions of int and string. But they are not the same. Array
of unions is actually an heterogeneous array - some elements can be ints and some other strings.

In summary, I propose we make all compound types named, but make names optional for all of

I like Doug's new syntax for unions. The earlier way to implicitly specifying unions by a
JSON array was not intuitive. If we make names optional and support both old and new syntax
for unions, the change will not break the old schemas. But I suggest we withdraw support for
the old syntax to keep the specification clean.

> make unions a named type
> ------------------------
>                 Key: AVRO-248
>                 URL: https://issues.apache.org/jira/browse/AVRO-248
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
> Unions are currently anonymous.  However it might be convenient if they were named. 
In particular:
>  - when code is generated for a union, a class could be generated that includes an enum
indicating which branch of the union is taken, e.g., a union of string and int named Foo might
cause a Java class like {code}
> public class Foo {
>   public static enum Type {STRING, INT};
>   private Type type;
>   private Object datum;
>   public Type getType();
>   public String getString() { if (type==STRING) return (String)datum; else throw ...
>   public void setString(String s) { type = STRING;  datum = s; }
>   ....
> }
> {code} Then Java applications can easily use a switch statement to process union values
rather than using instanceof.
>  - when using reflection, an abstract class with a set of concrete implementations can
be represented as a union (AVRO-241).  However, if one wishes to create an array one must
know the name of the base class, which is not represented in the Avro schema.  One approach
would be to add an annotation to the reflected array schema (AVRO-242) noting the base class.
 But if the union itself were named, that could name the base class.  This would also make
reflected protocol interfaces more consise, since the base class name could be used in parameters
return types and fields.
>  - Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance,
and this model is more useful if the union is named.
> This would be an incompatible change to schemas.  If we go this way, we should probably
rename 1.3 to 2.0.  Note that AVRO-160 proposes an incompatible change to data file formats,
which may also force a major release.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message