avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Crobak <joec...@gmail.com>
Subject optional enums
Date Mon, 20 Dec 2010 19:53:36 GMT
What's the "best" way to represent an optional enum in avro (in terms of
space efficiency, computational efficiency, and readability)?  To be
consistent with other optional fields, I was planning to use union of null
and my enum type.  The other approach I could see was adding a NULL field to
the enum -- but then my code would have to initialize the enum field to null
before a write.

I've tried to use union of null and the enum-type, but I've run into an
issue with this approach when using the AvroOutputFormat.  The following
code summarizes my issue:

  public void testDataWriteWithSchema() throws IOException {
    final DataFileWriter<Event> writer =
      new DataFileWriter<Event>(new SpecificDatumWriter<Event>());

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
    writer.append(getEvent());
    writer.close();
  }

  public void testDataWriteWithSchemaWithClass() throws IOException {
    final DataFileWriter<Event> writer =
      new DataFileWriter<Event>(new
SpecificDatumWriter<Event>(Event.class));

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
    writer.append(getEvent());
    writer.close();
  }


When I don't pass in the Event.class to SpecificDatumWriter (the first test
method), the above test fails with the following exception:

Not in union
["null", {"type":"enum","name":"Suit","namespace":"foo","symbols":["SPADES","CLUBS","HEARS","DIAMONDS"]}]:
SPADES

 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:382)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:67)

at org.apache.avro.generic.GenericDatumWriter.writeRecord(
GenericDatumWriter.java:100)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:62)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:54)

at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)


AvroOutputFormat uses the SpecificDatumWriter's default c'tor, so I run into
the above exception when using it.  Is there some way around this (other
than implementing my own OutputFormat that passes along the class?).

Thanks,
Joe

Mime
View raw message