avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Is there a way to conditionally read Avro data?
Date Sat, 17 Aug 2013 15:48:31 GMT
What Eric suggests (reader schemas) would work, but may incur a double
read cost when you wish to proceed based on a positive condition met
by the specific read.

If this data is held, order-wise, early into the record, then perhaps
using a custom DatumReader implementation (that does the low level
deserialization) may work more effectively. You can pass a DatumReader
when constructing a DataFileReader - but its quite a long route to go
IMO.

On Sat, Aug 17, 2013 at 4:17 AM, Eric Wasserman <ewasserman@247-inc.com> wrote:
> If you define you records like this (this is in the Avro IDL lang. for
> brevity)
>
> If you write your records with a schema like this:
>
>
> record R {
>
>     Header header;
>
>     Body body;
>
>   }
>
>
>
> Then you can read with a schema like this:
>
>
>   record RSansBody {
>
>     Header header;
>
>   }
>
>
> And the Avro libraries will read the header part (in which your "type" would
> reside) and effectively skip the body part.
>
> ________________________________
> From: Anna Lahoud <annalahoud@gmail.com>
> Sent: Friday, August 16, 2013 12:23 PM
> To: user@avro.apache.org
> Subject: Is there a way to conditionally read Avro data?
>
> I am wondering if there is a way that I can avoid reading all of an item in
> an Avro file, based on some of the data that I have already read. For
> instance, say I have a datum where I know that if it's 'type' value is a
> 'ComputerVirus', and that I do not want to touch the remaining fields. Is
> there a way to 'move on' and get the next datum, without touching the
> remainder of the scary datum? I would call it a 'conditional read' in that I
> only want to fully read the datum if the datum meets some criteria.
>
> Anna
>



-- 
Harsh J

Mime
View raw message