avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject benefits to store the data as key/value in avro
Date Sat, 05 Oct 2013 15:41:13 GMT
Hi, 
I have a question related to how to store my data in AVRO. Right now, I have 2 options, first
one is serialize the whole object as one Avro object, like following:
foo {id1 long,id2 long,id3 long,data record}
The question is that I know most of my data will be query by either id1, id2 or id3, in MR
job, or hive or pig.
So I am thinking that I maybe can store my data as key/value in avro
composite_key {id1 long,id2 long,id3 long}
value {data record}
My question is that what benefit 2nd format can bring? If the data is stored as Pair(composit_key,
value) in the Avro in HDFS, when querying time, assume most of the query will on id1 to id3,
Will I save the IO during the scanning? I mean will Avro only deserialize the ID fields for
most of the part in MR job?
If I don't get above benefit, then I didn't see any reason to store as key/value format, since
the first format will be good enough for most cases, right?
Thanks
Yong 		 	   		  
Mime
View raw message