avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-25) Blocking for value output (with API change)
Date Thu, 04 Jun 2009 20:33:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716391#action_12716391

Doug Cutting commented on AVRO-25:

Thanks!  The patch is much easier to read.  I can now see what's changed in ValueReader/Writer,
- doRead() might better be named doReadLong()
- doSkip() might better be named doSkipBytes() -- with no parameters to distinguish it from
- ByteReader and readString() duplicate logic -- perhaps we should have a doReadBytes() used
by both?
- ByteReader/ByteWriter use instanceof in an bad way -- maybe they should have two constructors
- in GenericDatumWriter, still casting to GenericFixed
- in GenericDatumWriter, why is arraySize now int rather than long?
- why does GenericArray#size() now return int rather than long?
- why are array item counts ints rather than longs?  Pig, for example, has containers that
are paged to disk which might have more than 2^32 elements.  I don't see anything that's gained
by using int here.
- changes to TestFSData and TestSchema seem spurious?

> Blocking for value output (with API change)
> -------------------------------------------
>                 Key: AVRO-25
>                 URL: https://issues.apache.org/jira/browse/AVRO-25
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-25.patch, AVRO-25.patch, AVRO-25.patch, AVRO-25.patch, AVRO-25.patch
> The Avro specification has provisions for decomposing very large arrays and maps into
"blocks."  These provisions allow for streaming implementations that would allow one to, for
example, write the contents of a file out as an Avro array w/out knowing in advance how many
records are in the file.
> The current Java implementation of Avro does support this provision.  My colleague Thiru
will be attaching a patch which implements blocking.  It turns out that the buffering required
to do blocking is non-trivial, so it seem beneficial to include a standard implementation
of blocking as part of the reference Avro implementation.
> This is an early version of the code.  We are still working on testing and performance
tuning.  But we wanted early feedback.
> This patch also includes a new set of classes called ValueInput and ValueOutput, which
are meant to replace ValueReader and ValueWriter.  These classes have largely the same API
as ValueReader/Writer, but they include a few more methods to "bracket" items that appear
inside of arrays and maps.  Shortly, we'll be posting a separate patch which implements further
subclasses of ValueInput/Output that do "validation" of input and output against a schema
(and also do automatic schema resolution for readers).
> We're implementing these classes separate from ValueInput/Output to allow you to kick
our tires w/out causing too much disruption to your source trees.  Let's validate the basic
idea behind these patches first, and then determine the details of integrating them into the
rest of Avro.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message