Return-Path: Delivered-To: apmail-avro-user-archive@www.apache.org Received: (qmail 77233 invoked from network); 20 Dec 2010 19:54:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Dec 2010 19:54:06 -0000 Received: (qmail 13682 invoked by uid 500); 20 Dec 2010 19:54:06 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 13618 invoked by uid 500); 20 Dec 2010 19:54:05 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 13610 invoked by uid 99); 20 Dec 2010 19:54:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Dec 2010 19:54:05 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=FREEMAIL_FROM,HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joecrow@gmail.com designates 209.85.161.169 as permitted sender) Received: from [209.85.161.169] (HELO mail-gx0-f169.google.com) (209.85.161.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Dec 2010 19:53:57 +0000 Received: by gxk5 with SMTP id 5so1803439gxk.0 for ; Mon, 20 Dec 2010 11:53:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=4DgYUiSFYbqTAAotQZFe003vgpY1a/knEr3zsseX5iA=; b=BDbY2pVNUGAtNuWFFzqR5KZ+CtmSQqtKQtLUUc82bM/rVE3c9aXhyw1Xg7ooJVAkxO 95BIkbVgqCcelgl2sB5xUOjSOl1RhRZy91w/hL9ER2O2TPPvZLVJIFJc4aeD0iM3rZqs LVCruKQ31q8DNYpCdxpdRSMKUAWPyklDW6yaU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=gzu8qPTvZ4CyUvu11PYS3EQbnv8Oe2WGbYc5mWwhRgQjPcqgeZqqO9HHMyJOnOQXTR zI+C+4PJVbfXgZgV9C3KuMvVRmz0wpXFNJqRlh8oC5ga0dZhO9YkX0MqxaAh5+4745Qx ZMIFEgiHp9QVa8YUDz0/3+VEuHgpbv3SIAZ+s= MIME-Version: 1.0 Received: by 10.147.170.7 with SMTP id x7mr6549346yao.23.1292874816076; Mon, 20 Dec 2010 11:53:36 -0800 (PST) Received: by 10.146.167.8 with HTTP; Mon, 20 Dec 2010 11:53:36 -0800 (PST) Date: Mon, 20 Dec 2010 14:53:36 -0500 Message-ID: Subject: optional enums From: Joe Crobak To: user@avro.apache.org Content-Type: multipart/alternative; boundary=20cf3056425526bb030497dce022 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3056425526bb030497dce022 Content-Type: text/plain; charset=ISO-8859-1 What's the "best" way to represent an optional enum in avro (in terms of space efficiency, computational efficiency, and readability)? To be consistent with other optional fields, I was planning to use union of null and my enum type. The other approach I could see was adding a NULL field to the enum -- but then my code would have to initialize the enum field to null before a write. I've tried to use union of null and the enum-type, but I've run into an issue with this approach when using the AvroOutputFormat. The following code summarizes my issue: public void testDataWriteWithSchema() throws IOException { final DataFileWriter writer = new DataFileWriter(new SpecificDatumWriter()); writer.create(Event.SCHEMA$, new File("target/datafile-test.avro")); writer.append(getEvent()); writer.close(); } public void testDataWriteWithSchemaWithClass() throws IOException { final DataFileWriter writer = new DataFileWriter(new SpecificDatumWriter(Event.class)); writer.create(Event.SCHEMA$, new File("target/datafile-test.avro")); writer.append(getEvent()); writer.close(); } When I don't pass in the Event.class to SpecificDatumWriter (the first test method), the above test fails with the following exception: Not in union ["null", {"type":"enum","name":"Suit","namespace":"foo","symbols":["SPADES","CLUBS","HEARS","DIAMONDS"]}]: SPADES at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:382) at org.apache.avro.generic.GenericDatumWriter.write( GenericDatumWriter.java:67) at org.apache.avro.generic.GenericDatumWriter.writeRecord( GenericDatumWriter.java:100) at org.apache.avro.generic.GenericDatumWriter.write( GenericDatumWriter.java:62) at org.apache.avro.generic.GenericDatumWriter.write( GenericDatumWriter.java:54) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245) AvroOutputFormat uses the SpecificDatumWriter's default c'tor, so I run into the above exception when using it. Is there some way around this (other than implementing my own OutputFormat that passes along the class?). Thanks, Joe --20cf3056425526bb030497dce022 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable What's the "best" way to represent an optional enum in avro (= in terms of space efficiency, computational efficiency, and readability)? = =A0To be consistent with other optional fields, I was planning to use union= of null and my enum type. =A0The other approach I could see was adding a N= ULL field to the enum -- but then my code would have to initialize the enum= field to null before a write.

I've tried to use union of null and the enum-type, but I= 've run into an issue with this approach when using the AvroOutputForma= t. =A0The following code summarizes my issue:

=A0=A0public void testDataWrite= WithSchema() throws IOException {
=A0=A0 =A0final DataFileWriter<Event> writer =3D
=A0=A0 =A0 =A0new= DataFileWriter<Event>(new SpecificDatumWriter<Event>());
=A0=A0 =A0writer.create(Event.SCHEMA$, new File("target/datafile-tes= t.avro"));
=A0=A0 =A0writer.append(getEvent()); =A0 =A0
=A0=A0 =A0writer.close();=A0=A0}
=
=A0=A0public void testDataWriteWithSchemaWithClass() throws IOException= {
=A0=A0 =A0final DataFileWriter<Event> writer =3D
=A0=A0 =A0 =A0new= DataFileWriter<Event>(new SpecificDatumWriter<Event>(Event.cla= ss));

=A0=A0 =A0writer.create(Event.SCHEMA$, new File("target/d= atafile-test.avro"));
=A0=A0 =A0writer.append(getEvent()); =A0 =A0
=A0=A0 =A0writer.close();=A0=A0}


When I don't pass in the Event.class to = SpecificDatumWriter (the first test method), the above test fails with the = following exception:=A0

Not in union ["null",=A0{"type":"enum",&qu= ot;name":"Suit","namespace":"foo","= symbols":["SPADES","CLUBS","HEARS","= ;DIAMONDS"]}]: SPADES

at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:382)

at org.apache.avro.generic.GenericDatumWriter.write(= GenericDatumWriter.java:67)

at org.apache.avro.generic.GenericDatumWriter.writeRecord(= GenericDatumWriter.java:100)

at org.apache.avro.generic.GenericDatumWriter.write(= GenericDatumWriter.java:62)

at org.apache.avro.generic.GenericDatumWriter.write(= GenericDatumWriter.java:54)

at org.apache.avro.file.DataFileWriter.append(DataFi= leWriter.java:245)


Avr= oOutputFormat uses the SpecificDatumWriter's default c'tor, so I ru= n into the above exception when using it. =A0Is there some way around this = (other than implementing my own OutputFormat that passes along the class?).=

Thanks,
Joe

--20cf3056425526bb030497dce022--