avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Avro 1.3 RC soon?
Date Wed, 03 Feb 2010 11:03:45 GMT
Well, as luck would have it, 1 minute after sending this JIRA came back to life.

On Feb 3, 2010, at 2:46 AM, Scott Carey wrote:

> I have a patch well under way for overhauling BinaryDecoder performance.  I'll put the
partial patch up in the morning (new BinaryDecoder and test code), since it doesn't look like
JIRA will be up for a while.  It has said it will be back in '30 minutes' for the last 3 hours.
> I would like to get these changes in for 1.3 because there are some API semantic changes
that would be better to change now than later.  If we get too locked in we may never be able
to get the performance on the low level decoding/encoding to its peak capability.
> Here is the would-be description of the ticket in case anyone wants to comment before
I can get it into JIRA:
> ------------------------------------------------
> BinaryDecoder has room for significant performance improvement.  [AVRO-327|https://issues.apache.org/jira/browse/AVRO-327]
has some preliminary work here, but in order to satisfy some use cases there is much more
work to do.
> I am opening a new ticket because the scope of the changes needed to do this the right
way are larger.
> I have done a large bulk of a new implementation that abstracts a 'ByteSource' from the
BinaryDecoder.  Currently BinaryDecoder is tightly coupled to InputStream.  The ByteSource
can wrap an InputStream, FileChannel, or byte[] in this version, but could be extended to
support other channel types, sockets, etc.  This abstraction allows the BinaryDecoder to buffer
data from various sources while supporting interleaved access to the underlying data and greater
flexibility going forward.
> The performance of this abstraction has been heavily tuned so that maximum performance
can be achieved even for slower ByteSource implementations.
> For readers that must interleave reads on a stream with the decoder, this includes a
> {code}
> public InputStream inputStream();
> {code}
> method on the decoder that can serve interleaved reads.  
> Additionally it will be necessary to have a constructor on BinaryDecoder that allows
two BinaryDecoders to share a stream (and buffer).
> Performance results on this new version is better than previous prototypes:
> *current trunk BinaryDecoder*
> {noformat}
> ReadInt: 983 ms, 30.497877855999185 million entries/sec
> ReadLongSmall: 1058 ms, 28.336666040111496 million entries/sec
> ReadLong: 1518 ms, 19.75179889508437 million entries/sec
> ReadFloat: 657 ms, 45.61031157924184 million entries/sec
> ReadDouble: 761 ms, 39.387756709704355 million entries/sec
> ReadBoolean: 331 ms, 90.4268145647456 million entries/sec
> RepeaterTest: 7718 ms, 3.886725782038378 million entries/sec
> NestedRecordTest: 1884 ms, 15.91964611687992 million entries/sec
> ResolverTest: 8296 ms, 3.616055866616717 million entries/sec
> MigrationTest: 21216 ms, 1.4139999570144013 million entries/sec
> {noformat}
> *buffering BinaryDecoder*
> {noformat}
> ReadInt: 187 ms, 160.22131904871262 million entries/sec
> ReadLongSmall: 372 ms, 80.4863521975457 million entries/sec
> ReadLong: 613 ms, 48.882385721129246 million entries/sec
> ReadFloat: 253 ms, 118.16606270679061 million entries/sec
> ReadDouble: 275 ms, 108.94314257389068 million entries/sec
> ReadBoolean: 222 ms, 134.85327963176064 million entries/sec
> RepeaterTest: 3335 ms, 8.993007936329503 million entries/sec
> NestedRecordTest: 1152 ms, 26.0256943004597 million entries/sec
> ResolverTest: 4213 ms, 7.120659335077578 million entries/sec
> MigrationTest: 15310 ms, 1.9594884898992941 million entries/sec
> {noformat}
> Performance is 2x to 5x the throughput of trunk on most tests.  
> On Feb 2, 2010, at 2:20 PM, Doug Cutting wrote:
>> I'd like to roll an Avro 1.3 release candidate soon.
>> The biggest thing we're missing is interop testing for various 
>> implementations.  Ruby and Python both have data file implementations, 
>> so this should not be hard to add to these.  This requires two scripts, 
>> one to write a data file to build/test/interop/data/$lang.avro that uses 
>> the schema in share/test/schemas/interop.avsc.  The second then just 
>> needs to read all of the data files in build/test/interop/data/.  These 
>> scripts should both be invoked by the top-level build.sh.
>> RPC interop tests would be nice to have too, but I'm not sure we should 
>> hold the release for that.
>> Do folks agree that these are the right priorities for the 1.3 release? 
>> If so, do we have volunteers who can get data file interop tests 
>> implemented soon?  Ideally we might roll a release candidate late this week.
>> Thanks,
>> Doug

View raw message