avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-29) Validation and resolution for ValueInput/ValueOutput
Date Thu, 25 Jun 2009 19:46:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724231#action_12724231

Doug Cutting commented on AVRO-29:

I just reverted the changes to GenericDatumReader, as I'm unsure of the performance impacts.
 Perhaps, in a new Jira issue, we should reconsider this.  If the parsing table must be reused
for good performance, then we might cache these in a WeakIdentityHashMap<Schema,ParsingTable>,
so that naive applications do not suffer.

Also, I should have mentioned with the commit, I made a few changes to the patch:
 - made references to ParsingTable and Resolving table package private so that they would
not show up in end-user javadoc;
 - fixed javadoc comments that still linked to ValueReader/Writer; and
 - moved parsing.html into the javadoc tree linked to from theses classes javadoc.

> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>                 Key: AVRO-29
>                 URL: https://issues.apache.org/jira/browse/AVRO-29
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>             Fix For: 1.0.0
>         Attachments: AVRO-29.patch, AVRO-29.patch, AVRO-29.patch
> This is a companion to AVRO-25, which introduced the classes ValueOutput and ValueInput.
 This patch adds two capabilities: validation of ValueInput/Output calls against a schema,
and schema-resolution implemented in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will validate calls
against a schema.  For example, if the schema calls for a record consisting of two longs and
a double, then ValidatingOutput will allow the call-sequence readLong, readLong, readDouble
and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema, and automatically
performs Avro's schema-resolution logic on behalf of the reader.  For example, if the writer's
schema calls for a long, and the readers calls for a double, then the reader can call readDouble,
and ResolvingValueInput will automatically decode the long sent by the writer and convert
it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader, which also
implements Avro's resolution logic.  In many use-cases, the programmer has their own data
structures into which they want to store data read from an Avro stream, data structures that
cannot easily be put into the GenericRecord/Array class hierarchy.  With ResolvingValueInput,
programmers get the benefit of this resolution logic without being forced into the GenericRecord/Array
class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of the resolution
logic, and that GenericDatumReader be implemented in terms of ResolvingValueInput.  However,
we haven't implemented this change pending feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to LL(1) parsing
tables.  This translation is straight forward, but tedious.  If you want to understand how
the code works, we recommend that you look in the file "parsing.html" (included in the patch),
which explains the translation.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message