avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-75) Clarify resolution for enums (and fix code)
Date Wed, 08 Jul 2009 18:15:14 GMT

    [ https://issues.apache.org/jira/browse/AVRO-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728834#action_12728834

Doug Cutting commented on AVRO-75:

> This is the only place the word "unset" is used in the doc [ ... ]

That may be a bug.  We don't currently say what happens when a reader has a field that a writer
didn't write and where no default value is specified in the reader's schema.  In this case
we might provide some implementation flexibility.  An optimized implementation might have
an int field that's just set to zero in this case, with no means to differentiate this from
an actual zero value for the field, or it might provide a means to tell whether this field
was set or not.  I am not yet convinced that the spec should mandate this behavior, but might
rather define such fields as "unset", which may or may not be detectable, depending on the

This gets back to the issue of optional/required fields.  I think the current intent is to
treat all fields as optional, that, if a field must have a valid value then one can specify
a default value in the schema rather than require that all writers already have that field.

> I propose that we declare this case an error [ ... ]

"Unset" may make sense in this case too.  In Java's reflect and specific API's, an Enum instance
could be null.  This is somewhat analogous to a field that's been added to the writer but
not yet to the reader.  A reader that requires this might provide a default value that would
be used when either the writer does not provide a value or the writer provides an unknown

That said, I don't have a strong feeling and would be willing to make this an error if others
can explain why that should be preferred.

> GenericDatumReader should be updated to throw an error in this case.

Or, if we decide that "unset" is useful here, we could have it use null or the default in
this case.  We'd then need to update ReflectDatumReader too, as it currently throws an exception
in this case.  In either case, the code does not conform to the spec.

> Clarify resolution for enums (and fix code)
> -------------------------------------------
>                 Key: AVRO-75
>                 URL: https://issues.apache.org/jira/browse/AVRO-75
>             Project: Avro
>          Issue Type: Bug
>          Components: spec
>            Reporter: Raymie Stata
>            Assignee: Doug Cutting
> The current resolution rule for enum's says: "if the writer's symbol is not present in
the reader's enum, then the enum value is unset."  This is the only place the word "unset"
is used in the doc, it's not clear what you mean.  The code seems to be inconsistent: GenericDatumReader
will happily return a symbol the reader doesn't understand; ReflectDatumReader will probably
throw a class-not-found exception; ResolvingDecoder throws an error.
> I propose that we declare this case an error, i.e., rewrite the spec to "if the writer's
symbol is not listed in the reader's enum, an error is signaled."  GenericDatumReader should
be updated to throw an error in this case.
> If we decide to stick with the "unset" language, we need to define what "unset" means
(and, if necessary, update ReflectDatumReader and ResolvingDecoder).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message