avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvalluvan M. G. (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-29) Validation and resolution for ValueInput/ValueOutput
Date Thu, 18 Jun 2009 16:18:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721329#action_12721329

Thiruvalluvan M. G. commented on AVRO-29:

  - The only real use of ValidatingValueReader/Writer is validation. It can be used for testing
new class that directly uses ValueReader/Writer objects. Since it is designed as a filter,
it can be inserted into the chain to detect any corner-case bugs even in production environments.
At best it can be used for diagnostic purposes.
  - There are two versions of readRecord because of the difference in behavior of ResolvingValueReader
compared to ValueReader. The ValueReader returns objects in the order of their declaration
in the reader's schema. For ResolvingValueReader could return in a different order depending
on writer's schema. If we can achieve reordering of fields (which is possible with some more
effort), then we can get rid of the second version of readRecord(). In fact if reader can
expect its contents in the order of its schema and if support for default values is added,
all the resolution is internal to the ResolvingValueReader. Any reader can simply read as
if the data is serialized according to its schema.
 - The parsing table can be considered as a binary version of schema. (There is some information
loss presently, but it can be taken care of). One can define an avro schema that serializes
parsing table itself. With that, an RPC can send data along with its schema which a receiver
can readily use to resolve against receiver's schema. This is functionally equivalent to sending
the JSON version of schema, but is more efficient. This is particularly useful for scatter/gather
kind of RPCs where many receivers receive the same request. The time saved thus could be significant.
- Once we agree on the usefulness of these classes, we can move them around appropriately.

> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>                 Key: AVRO-29
>                 URL: https://issues.apache.org/jira/browse/AVRO-29
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-29.patch, AVRO-29.patch
> This is a companion to AVRO-25, which introduced the classes ValueOutput and ValueInput.
 This patch adds two capabilities: validation of ValueInput/Output calls against a schema,
and schema-resolution implemented in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will validate calls
against a schema.  For example, if the schema calls for a record consisting of two longs and
a double, then ValidatingOutput will allow the call-sequence readLong, readLong, readDouble
and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema, and automatically
performs Avro's schema-resolution logic on behalf of the reader.  For example, if the writer's
schema calls for a long, and the readers calls for a double, then the reader can call readDouble,
and ResolvingValueInput will automatically decode the long sent by the writer and convert
it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader, which also
implements Avro's resolution logic.  In many use-cases, the programmer has their own data
structures into which they want to store data read from an Avro stream, data structures that
cannot easily be put into the GenericRecord/Array class hierarchy.  With ResolvingValueInput,
programmers get the benefit of this resolution logic without being forced into the GenericRecord/Array
class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of the resolution
logic, and that GenericDatumReader be implemented in terms of ResolvingValueInput.  However,
we haven't implemented this change pending feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to LL(1) parsing
tables.  This translation is straight forward, but tedious.  If you want to understand how
the code works, we recommend that you look in the file "parsing.html" (included in the patch),
which explains the translation.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message