hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Avro/Specification2Proposals" by JohnPlevyak
Date Sat, 01 May 2010 00:02:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Avro/Specification2Proposals" page has been changed by JohnPlevyak.
http://wiki.apache.org/hadoop/Avro/Specification2Proposals?action=diff&rev1=2&rev2=3

--------------------------------------------------

  Efficient support could include either an explicit presence test or a function which returns
the value
  or default value (if the field is not present). 
  
+ == Named Unions(AVRO-248) ==
+ 
+ === Arguments in Favor ===
+ 
+   * Anonymous unions make reuse difficult (AVRO-266)
+   * Other serialization systems support names for unions and branches, arrays
+ 
+ === Proposal ===
+ 
+  : { "type": "union", "name": "Foo", "branches": ["string", "Bar", ... ] }
+ 
+ === Language APIs ===
+ 
+ For Java, code is generated for a union, a class could be generated that includes an enum
indicating which branch of the union is taken, e.g., a union of string and int named Foo might
cause a Java class like
+ 
+       public class Foo {
+         public static enum Type {STRING, INT};
+         private Type type;
+         private Object datum;
+         public Type getType();
+         public String getString() { if (type==STRING) return (String)datum; else throw ...
}
+         public void setString(String s) { type = STRING;  datum = s; }
+         ....
+       }
+ 
+       Then Java applications can easily use a switch statement to process union values rather
than using instanceof.
+     * when using reflection, an abstract class with a set of concrete implementations can
be represented as a union (AVRO-241). However, if one wishes to create an array one must know
the name of the base class, which is not represented in the Avro schema. One approach would
be to add an annotation to the reflected array schema (AVRO-242) noting the base class. But
if the union itself were named, that could name the base class. This would also make reflected
protocol interfaces more consise, since the base class name could be used in parameters return
types and fields.
+     * Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance,
and this model is more useful if the union is named.
+ 
+ == Named Branches (discussed in AVRO-248) ==
+ 
+ === Arguments in Favor ===
+ 
+   * Anonymous branches are not supported in some languages and require casts or type checks
in others
+   * One argument against named branches was that anonymous branches are a good way of handling
nullable fields which could be handled as optionals (above)
+   * Other serialization systems support names for unions and branches, arrays
+ 
+ === Proposal ===
+ 
+  : { "type": "union", "name": "Foo", "branches": [ {"name": "URL", "type": "string"} , {"name":
"hostname", "type": "string"} , ... ] }
+ 
+ === Language APIs ===
+ 
+ The language API should produce named typed accessors in addition to the tag.  Languages
which have native support for named branches e.g. C, C++, Pascal etc. should use an explicit
tag and their native unions.
+ 

Mime
View raw message