avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-459) Allow lazy reading of large fields from data files
Date Fri, 12 Mar 2010 04:01:27 GMT

    [ https://issues.apache.org/jira/browse/AVRO-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844349#action_12844349

Aaron Kimball commented on AVRO-459:

I have a use case for creating files where individual fields are very large (possibly hundreds
of MB). I would like to be able to store these records in Avro files (The large fields in
question are just byte arrays; a record contains this field and possibly an identifier of
some sort).

The actual byte array itself may be too big to materialize in RAM. It would be good to have
a "lazy" reader which can seek to an arbitrary record boundary, and then return an InputStream
(or Reader for character-based arrays) and allow me to use this to pull more contents of the
field in as I need to process them. It would be even better if the returned stream is able
to seek past uninteresting parts of the byte array to the end.

Using the file reader's ability to iterate over records in the file should just seek past
these fields rather than scanning their entire contents (even if I make use of other fields
of the same record).

> Allow lazy reading of large fields from data files
> --------------------------------------------------
>                 Key: AVRO-459
>                 URL: https://issues.apache.org/jira/browse/AVRO-459
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Aaron Kimball
> The current file reader will attempt to materialize individual fields entirely in RAM.
If a record is too big to fit in RAM, it would be good to get a stream-based API to very large

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message