avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yibing Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1817) Allow enums to be "promoted" to strings
Date Fri, 29 Jul 2016 05:55:20 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398756#comment-15398756
] 

Yibing Shi commented on AVRO-1817:
----------------------------------

[~busbey], I am not sure whether this task is feasible, especially when binary encoder/writer
is used.
AFAICS, in {{GenericDatumWriter}}, an enum value is written as its offset in schema enum symbols
list.
{code}
  protected void writeEnum(Schema schema, Object datum, Encoder out)
    throws IOException {
    if (!data.isEnum(datum))
      throw new AvroTypeException("Not an enum: "+datum);
    out.writeEnum(schema.getEnumOrdinal(datum.toString()));
  }
{code}

If {{BinaryEncoder}} is used, this offset is write *through* to the data file, without any
flags added to it.
{code}
  public void writeEnum(int e) throws IOException {
    this.writeInt(e);
  }
{code}

In datum reader and decoder, it is very hard, if not impossible, to figure out whether the
data to read is actually an enum or an actual string. Things can be even more complicated
if unicode string is considered.

> Allow enums to be "promoted" to strings
> ---------------------------------------
>
>                 Key: AVRO-1817
>                 URL: https://issues.apache.org/jira/browse/AVRO-1817
>             Project: Avro
>          Issue Type: Improvement
>          Components: java, spec
>            Reporter: Michael Overmeyer
>            Priority: Minor
>
> We should consider adding a resolution rule that can promote an enum to a string using
the enum's symbol.
> I have an Avro schema that has a field with an enum type. However, I have realized that
an enum is not the type I actually wanted. I would much rather have the type of the field
be a string. I went to change this, but of course this type of change (enum -> string)
is not within the bounds of Avro's schema evolution. Therefore a reader with this changed
schema is not be able to read an object written with the old schema.
> For example, if the writer schema was:
> enum Colour {
>   RED, YELLOW, GREEN 
> }
> protocol stoplight {
>   Colour colour;
> }
> And the reader schema was:
> protocol stoplight {
>   string colour;
> }
> Then when you access the colour field of your object, you get the string representation
of the enum value's symbol .
> For example, Colour.RED => "RED", Colour.YELLOW => "YELLOW", Colour.GREEN =>
"GREEN"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message