avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-29) Validation and resolution for ValueInput/ValueOutput
Date Wed, 17 Jun 2009 21:52:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720921#action_12720921

Doug Cutting commented on AVRO-29:

Some comments and questions:
- What's the advantage of using ResovlingRecordReader over the existing code in GenericDatumReader?
 This replaces existing logic with a lot more code, code that's harder to understand, IMHO.
 It's interesting that it can be done this way, but what value does this give applications?
 Above Raymie states, "With ResolvingValueInput, programmers get the benefit of this resolution
logic without being forced into the GenericRecord/Array class hierarchy." But applications
already are not forced into this.  They simply must override addField(), getField() and removeField(),
as with ResolvingRecordReader.  For example, SpecificDatumReader extends GenericDatumReader
to read records that do not implement GenericRecord.  What am I missing?
 - Similarly, what's the advantage of the ValidatingValueReader/Writer?  It feels a bit like
belt-and-suspenders.  The existing DatumReader and DatumWriter implementations do not perform
invalid operations.  This is tested by unit tests, so why check it again at runtime for every
record, even if it can be done relatively cheaply?  We don't expect there to be thousands
of DatumReader and DatumWriter implementations.  All existing ones extend GenericDatumReader/Writer,
which should prevent invalid operations, and I have a hard time imagining an application which
cannot be reasonably implemented by extending these.  Again, what am I missing?
- GenericDatumReader: ValidatingValueReader import is not used;
- GenericDatumReader: It's unfortunate that this includes two versions of readRecord.
- Might the resolving/parsing classes would better live in an io.validate package?
- What should we do with the parsing.html file?  If it's for developers only, then it could
live with the source code (perhaps alongside the .java files), in javadoc or on the wiki.
 If we think end-users might be interested, then it should go into the forrest-generated docs.

> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>                 Key: AVRO-29
>                 URL: https://issues.apache.org/jira/browse/AVRO-29
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-29.patch, AVRO-29.patch
> This is a companion to AVRO-25, which introduced the classes ValueOutput and ValueInput.
 This patch adds two capabilities: validation of ValueInput/Output calls against a schema,
and schema-resolution implemented in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will validate calls
against a schema.  For example, if the schema calls for a record consisting of two longs and
a double, then ValidatingOutput will allow the call-sequence readLong, readLong, readDouble
and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema, and automatically
performs Avro's schema-resolution logic on behalf of the reader.  For example, if the writer's
schema calls for a long, and the readers calls for a double, then the reader can call readDouble,
and ResolvingValueInput will automatically decode the long sent by the writer and convert
it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader, which also
implements Avro's resolution logic.  In many use-cases, the programmer has their own data
structures into which they want to store data read from an Avro stream, data structures that
cannot easily be put into the GenericRecord/Array class hierarchy.  With ResolvingValueInput,
programmers get the benefit of this resolution logic without being forced into the GenericRecord/Array
class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of the resolution
logic, and that GenericDatumReader be implemented in terms of ResolvingValueInput.  However,
we haven't implemented this change pending feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to LL(1) parsing
tables.  This translation is straight forward, but tedious.  If you want to understand how
the code works, we recommend that you look in the file "parsing.html" (included in the patch),
which explains the translation.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message