avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From felix gao <gre1...@gmail.com>
Subject Re: Avro speed comparison with raw logs
Date Thu, 31 Mar 2011 01:26:35 GMT

I am using avro 1.4.1 and the problem is not avro been slow, is the
AvroStorage does a recursive schema validation that makes it so slow.  It is
fixed now.


On Fri, Mar 4, 2011 at 9:25 AM, Doug Cutting <cutting@apache.org> wrote:

> On 03/01/2011 09:05 PM, felix gao wrote:
> > I am running some comparison tests on a data set that I converted to
> > avro with deflator set to level 6. The original logs consists of 2880
> > uncompressed http access logs with a total size of 1.4TB. The Compressed
> > avro log is about 2/3 of the size.  However, when I ran the same pig job
> > on the raw logs, it is blazing fast during the initial map phase.
> > Finished in under 40 min. When I ran the same pig job with avro files,
> > the initial map phase took 8 minutes to only finish 10%.  I am wondering
> > is there any way to figure out what is slowing down the map?
> What version of Avro are you using?  How are you integrating Avro with Pig?
> Also, for speed, you might try level=1 (Deflater.BEST_SPEED).
> Doug

View raw message