From Clebert Suconic <csuco...@redhat.com>
Subject Optimizations on Proton-j
Date Tue, 29 Apr 2014 13:27:28 GMT
I have done some work last week on optimizing the Codec.. and I think i've gotten some interesting

- The Decoder now is stateless, meaning the same instance can be used over and over (no more
need for one instance per connection). Bozo Dragojefic has actually seen how heavy is to create
a Decoder and has recently optimized MessageImpl to always take the same instance through
ThreadLocals. This optimization goes a step further
- I have changed the ListDecoders somehow  you won't need intermediate objects to parse Types.
For now I have only made Transfer as that effective type but I could do that for all the other
Types at some point
- There were a few hotspots that I found on the test and I have refactored accordingly, meaning
no semantic changes.

As a result of these optimizations, DecoderImpl won't have a setBuffer method any longer.
Instead of that each method will take a read(ReadableBuffer..., old signature).

And talking about ReadableBuffer, I have introduced the interface ReadableBuffer. When integrating
on the broker, I had a situation where I won't have a ByteBuffer, and this interface will
allow me to further optimize the Parser later as I could take the usage of Netty Buffer (aka

You will find these optimizations on my branch on github: https://github.com/clebertsuconic/qpid-proton/tree/optimizations

Where I will have two commits:

I - a micro benchmark where I added a testcase doing a direct read on the buffer without any
framework. I've actually written a simple parser that will work for the byte array I have,
but that's very close to reading directly from the bytes.
   I used that to compare raw reading and interpreting the buffer to the current framework
we had.
   I was actually concerned about the number of intermediate objects, so I used that to map
these differences.


I - a commit with the actual optimizations:


Without these optimizations my MicroBenchmark, parsing 10000000L instances of Transfer, without
reallocating any buffers could complete on my laptop in:

- 3480 milliseconds , against 750 milliseconds with raw reading

After these optimizations:
- 1927 milliseconds, against 750 milliseconds with raw reading

Notice that this will also minimize the footprint of the codec but I'm not measuring that

I'm looking forward to work with this group, I actually had a meeting with Rafi and Ted last
week, and I plan to work closer to you guys on this

Clebert Suconic

