avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Avro vs. Hadoop serialization performance?
Date Mon, 14 Mar 2011 21:23:26 GMT
If you're I/O bound, Avro will be faster.  Avro's raw field serialization
is very fast, but some types of object marshaling are not yet that fast.
Hadoop's Writables aren't all that fast themselves anyway.

I don't know of any public direct benchmarks comparing the two in a
standard Hadoop MapReduce.

When attempted with Pig, Avro was faster (PIG-794):
Storage   Time spent on job_1   Output size of job_1   Mapper task number
of job_2   Time spent on job_2   Total spent time on pig script
AvroStorage   3min 51 sec  7.96G  120 17min 09 sec 21min 0 sec
InterStorage  4min 33 sec  9.55G  143  17min 17 sec  21min 50 sec

On 3/14/11 1:59 PM, "Aleksey Maslov" <Aleksey.Maslov@Lab49.com> wrote:

>Has there been any benshmarking done to determine which serialization
>architecture is better - Hadoop vs. Avro;
>I understand Avro has language neutrality as its big plus; but what about
>the perf?
>and yes, its a loaded question -all depends on the nature of the data:
>vs. numeric - but still, are they close?
>View this message in context:
>Sent from the Avro - Users mailing list archive at Nabble.com.

View raw message