river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: Trunk merge and thread pools
Date Sat, 05 Dec 2015 04:50:02 GMT
I'll look into it.

On 12/4/2015 8:01 PM, Peter wrote:
> Yes,
> There's a tool called Rat, there's a shell script it's included in our trunk, it checks
our license headers etc are compliant for release.  I don't recall the details unfortunately,
but I think you download the rat tool, set an env variable and run the script.  That would
be very helpful.
> Thanks,
> Peter.
> Sent from my Samsung device.
>    Include original message
> ---- Original message ----
> From: Patricia Shanahan <pats@acm.org>
> Sent: 05/12/2015 12:01:04 pm
> To: dev@river.apache.org
> Subject: Re: Trunk merge and thread pools
> Are there any practical things I can do to expedite the release? For
> example, if you rough draft documentation and/or release notes, I can do
> some editing.
> On 12/4/2015 5:08 PM, Peter wrote:
>>   Trunk has now been replaced.  The only changes I'll make now are
>>   documentation, release notes, build scripts, license header checks
>>   and key signing.
>>   I could use some help as i am time poor.
>>   Presently I'm marking bugs on Jira as resolved, then I'll generate
>>   the release notes.
>>   This release has been more thoroughly tested than any previous River
>>   release
>>   After 3 is released, we could really utilise your experience
>>   Patricia, especially with JERI's ConnectionManager and Multiplexer,
>>   it uses complex shared and nested locks, supports 128 concurrent
>>   connections (shared endpoints) over one Socket or Channel.  If I can
>>   get contention with 2 cpu's, it's only going to get worse in a real
>>   world situation.
>>   This contention would only affect nodes that share multiple remote
>>   objects between them.  I suspect Gregg's use case will have multiple
>>   connections between node pairs and hit this contention, I also
>>   suspect that Greg is likely to  only have 1 or 2 connections between
>>   nodes.  Once two nodes have more than 128 connections (connections
>>   are directly proportional to the number of remote objects, server or
>>   client shared between two nodes) another multiplexer will be created,
>>   and so on, multiplexers sync on the ConnectionManager's monitor.
>>   I have a Sun T5240 (128 way 64GB Ram), with DilOS (an Illumos based
>>   distro with Debian package management).  Soon, I should have an high
>>   speed ipv6 connection courtesy of the NBN. When I do I'll set you up
>>   with a remote login for testing.
>>   I'll delay further discussion of security until after 3 is released.
>>   The changes I propose will have no bearing (won't be in their call
>>   stack) on those who aren't concerned about security.
>>   I'll be gratefull for an opportunity to present my security code,
>>   perhaps doing so may even dispell some fears.
>>   Regards,
>>   Peter.
>>   Sent from my Samsung device. Include original message ---- Original
>>   message ---- From: Patricia Shanahan <pats@acm.org> Sent: 05/12/2015
>>   01:37:10 am To: dev@river.apache.org Subject: Re: Trunk merge and
>>   thread pools
>>   If you have a real world workload that shows contention, we could
>>   make serious progress on performance improvement - after 3.0 ships.
>>   I am not even disagreeing with changes that are only shown to make
>>   the tests more effective - after 3.0 ships.
>>   I am unsure about whether Peter is tilting at windmills or showing
>>   the optimum future direction for River with his security ideas. I
>>   would be happy to discuss the topic - after 3.0 ships.
>>   River 2.2.2, was released November 18, 2013, over two years ago
>>   There is already a lot of good stuff in 3.0 that should be available
>>   to users.
>>   I have a feeling at this point that we will still be discussing what
>>   should be in 3..0 this time next year. In order to get 3.0 out, I
>>   believe we need to freeze it. That means two types of changes only
>>   until it ships - changes related to organizing the release and fixes
>>   for deal-killing regression bugs.
>>   If I had the right skills and knowledge to finish up the release I
>>   would do it. I don't. Ironically, I do know about multiprocessor
>>   performance - I was performance architect for the Sun E10000 and
>>   SunFire 15k. Given a suitable benchmark environment, I would love to
>>   work on contention - after 3.0 ships.
>>   Patricia
>>   On 12/4/2015 6:19 AM, Gregg Wonderly wrote:
>>>   With a handful of clients, you can ignore contention.  My
>>>   applications have 20s of threads per client making very frequent
>>>   calls through the service and this means that 10ms delays evolve
>>>   into seconds of delay fairly quickly.
>>>   I believe that if you can measure the contention with tooling, on
>>>   your desktop, it is a viable goal to reduce it or eliminate it.
>>>   It's like system time vs user time optimizations of old.  Now we
>>>   are contending for processor cores instead of the processor, locked
>>>   in the kernel, unable to dispatch more network traffic where it is
>>>   always convenient to bury latency.
>>>   Gregg
>>>   Sent from my iPhone
>>>   On Dec 4, 2015, at 9:57 AM, Greg Trasuk <trasukg@stratuscom.com>
>>>   wrote:
>>>>>   On Dec 4, 2015, at 1:16 AM, Peter <jini@zeus.net.au> wrote:
>>>>>   Since ObjectInputStream is a big hotspot,  for testing
>>>>>   purposes, I merged these changes into my local version of
>>>>>   River,  my validating ObjectInputStream outperforms the
>>>>>   standard java ois
>>>>>   Then TaskManager, used by the test became a problem, with
>>>>>   tasks in contention up to 30% of the time.
>>>>>   Next I replaced TaskManager with an ExecutorService (River 3,
>>>>>   only uses TaskManager in tests now, it's no longer used by
>>>>>   release code), but there was still contention  although not
>>>>>   quite as bad.
>>>>>   Then I notice that tasks in the test call Thread.yield(),
>>>>>   which tends to thrash, so I replaced it with a short sleep of
>>>>>   100ms.
>>>>>   Now monitor state was a maximum of 5%, much better.
>>>>>   After these changes, the hotspot consuming 27% cpu was JERI's
>>>>>   ConnectionManager.connect,  followed by
>>>>>   Class.getDeclaredMethod at 15.5%, Socket.accept 14.4% and
>>>>>   Class.newInstance at 10.8%.
>>>>   First - performance optimization:  Unless you’re testing with
>>>>   real-life workloads, in real-ife-like network environments,
>>>>   you’re wasting your time.  In the real world, clients discover
>>>>   services pretty rarely, and real-world architects always make
>>>>   sure that communications time is small compared to processing
>>>>   time.  In the real world, remote call latency is controlled by
>>>>   network bandwidth and the speed of light.  Running in the
>>>>   integration test environment, you’re seeing processor loads, not
>>>>   network loads. There isn’t any need for this kind of
>>>>   micro-optimization.  All you’re doing is delaying shipping, no
>>>>   matter how wonderful you keep telling us it is.
>>>>>   My validating ois,  originating from apache harmony, was
>>>>>   modified to use explicit constructors during deserialization.
>>>>>   This addressed finalizer attacks, final field immutability and
>>>>>   input stream validation and the ois itself places a limit on
>>>>>   downloaded bytes by controlling

View raw message