From common-commits-return-10932-apmail-hadoop-common-commits-archive=hadoop.apache.org@hadoop.apache.org Sat May 01 00:02:51 2010 Return-Path: Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: (qmail 74597 invoked from network); 1 May 2010 00:02:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 May 2010 00:02:51 -0000 Received: (qmail 88793 invoked by uid 500); 1 May 2010 00:02:51 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 88725 invoked by uid 500); 1 May 2010 00:02:51 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 88718 invoked by uid 500); 1 May 2010 00:02:51 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 88715 invoked by uid 99); 1 May 2010 00:02:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 May 2010 00:02:51 +0000 X-ASF-Spam-Status: No, hits=-1420.1 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 May 2010 00:02:50 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 77B0417D1C; Sat, 1 May 2010 00:02:29 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Sat, 01 May 2010 00:02:29 -0000 Message-ID: <20100501000229.4412.95437@eos.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22Avro/Specification2Proposals=22_?= =?utf-8?q?by_JohnPlevyak?= Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "Avro/Specification2Proposals" page has been changed by JohnPlevyak. http://wiki.apache.org/hadoop/Avro/Specification2Proposals?action=3Ddiff&re= v1=3D2&rev2=3D3 -------------------------------------------------- Efficient support could include either an explicit presence test or a fun= ction which returns the value or default value (if the field is not present). = = + =3D=3D Named Unions(AVRO-248) =3D=3D + = + =3D=3D=3D Arguments in Favor =3D=3D=3D + = + * Anonymous unions make reuse difficult (AVRO-266) + * Other serialization systems support names for unions and branches, ar= rays + = + =3D=3D=3D Proposal =3D=3D=3D + = + : { "type": "union", "name": "Foo", "branches": ["string", "Bar", ... ] } + = + =3D=3D=3D Language APIs =3D=3D=3D + = + For Java, code is generated for a union, a class could be generated that = includes an enum indicating which branch of the union is taken, e.g., a uni= on of string and int named Foo might cause a Java class like + = + public class Foo { + public static enum Type {STRING, INT}; + private Type type; + private Object datum; + public Type getType(); + public String getString() { if (type=3D=3DSTRING) return (String)= datum; else throw ... } + public void setString(String s) { type =3D STRING; datum =3D s; } + .... + } + = + Then Java applications can easily use a switch statement to process= union values rather than using instanceof. + * when using reflection, an abstract class with a set of concrete imp= lementations can be represented as a union (AVRO-241). However, if one wish= es to create an array one must know the name of the base class, which is no= t represented in the Avro schema. One approach would be to add an annotatio= n to the reflected array schema (AVRO-242) noting the base class. But if th= e union itself were named, that could name the base class. This would also = make reflected protocol interfaces more consise, since the base class name = could be used in parameters return types and fields. + * Generalizing the above: Avro lacks class inheritance, unions are a = way to model inheritance, and this model is more useful if the union is nam= ed. + = + =3D=3D Named Branches (discussed in AVRO-248) =3D=3D + = + =3D=3D=3D Arguments in Favor =3D=3D=3D + = + * Anonymous branches are not supported in some languages and require ca= sts or type checks in others + * One argument against named branches was that anonymous branches are a= good way of handling nullable fields which could be handled as optionals (= above) + * Other serialization systems support names for unions and branches, ar= rays + = + =3D=3D=3D Proposal =3D=3D=3D + = + : { "type": "union", "name": "Foo", "branches": [ {"name": "URL", "type"= : "string"} , {"name": "hostname", "type": "string"} , ... ] } + = + =3D=3D=3D Language APIs =3D=3D=3D + = + The language API should produce named typed accessors in addition to the = tag. Languages which have native support for named branches e.g. C, C++, P= ascal etc. should use an explicit tag and their native unions. +=20