activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Tully <gary.tu...@gmail.com>
Subject Re: A proposal to rewrite purgeInactiveDestinations locking to prevent queue GC lockups.
Date Tue, 24 Feb 2015 16:03:05 GMT
if there are any test failures - try to run them individually
-Dtest=a,b etc. There may be an issue with a full test run, but all of
the tests that are enabled should work. I know there are some issues
with jdbc tests that hang or fail due to previous runs no cleaning up,
but that should be the most of it. I got a bunch of full test runs
before the 5.11 release if that is any help.

On 23 February 2015 at 20:38, Kevin Burton <burton@spinn3r.com> wrote:
> OK.  This is ready to go and I have a patch branch:
>
> https://issues.apache.org/jira/browse/AMQ-5609
>
> I’m stuck at the moment though because tests don’t pass.  But it was
> failing tests before so I don’t think it has anything to do with my
> changes.
>
>
>
> On Sun, Feb 22, 2015 at 11:11 PM, Kevin Burton <burton@spinn3r.com> wrote:
>
>> Actually, is the lock even needed here?  Why would it be?  if we’re
>> *removing* a subscription, why does it care if we possibly ALSO remove a
>> separate / isolated queue before/after the subscription is removed.
>>
>> I think this is redundant and can be removed.  Maybe I’m wrong though.
>>
>> I looked at all the callers and none were associated with queues.
>>
>> On Sun, Feb 22, 2015 at 11:07 PM, Kevin Burton <burton@spinn3r.com> wrote:
>>
>>> So I have some working/theoretical code that should resolve this.
>>>
>>> It acquires a lock *per* ActiveMQDestination, this way there is no lock
>>> contention.
>>>
>>> But here’s where I’m stuck.
>>>
>>>     @Override
>>>>     public void removeSubscription(ConnectionContext context,
>>>> RemoveSubscriptionInfo info) throws Exception {
>>>>         inactiveDestinationsPurgeLock.readLock().lock();
>>>>         try {
>>>>             topicRegion.removeSubscription(context, info);
>>>>         } finally {
>>>>             inactiveDestinationsPurgeLock.readLock().unlock();
>>>>         }
>>>>     }
>>>
>>>
>>> .. this is in RegionBroker.
>>>
>>> There is no ActiveMQDestination involved here so I’m not sure the best
>>> way to resolve this.
>>>
>>> Any advice?
>>>
>>>
>>> On Sun, Feb 22, 2015 at 8:11 PM, Kevin Burton <burton@spinn3r.com> wrote:
>>>
>>>> Yes.  That was my thinking too.. that just replacing the CopyOnWriteArraySet
>>>> with something more performance would solve the issue.
>>>>
>>>> This would also improve queue creation time as well as queue deletion
>>>> time.
>>>>
>>>> What I think I”m going to do in the mean time is:
>>>>
>>>> - implement a granular lock based on queue name… I am going to use an
>>>> interface so we can replace the implementation later.
>>>>
>>>> - implement timing for the purge thread so it tracks how long it takes
>>>> to remove a queue but also how long the entire loop takes.
>>>>
>>>> I’ll do this on a branch so it should be easy to merge.
>>>>
>>>> On Sun, Feb 22, 2015 at 7:40 PM, Tim Bain <tbain@alumni.duke.edu> wrote:
>>>>
>>>>> A decent amount of the time is being spent calling remove() on various
>>>>> array-backed collections.  Those data structures might be inappropriate
>>>>> for
>>>>> the number of destinations you're running, since array-backed
>>>>> collections
>>>>> tend to have add/remove operations that are O(N); some improvement might
>>>>> come from something as simple as moving to a ConcurrentHashSet instead
>>>>> of a
>>>>> CopyOnWriteArraySet, for example.  (Or it might make performance worse
>>>>> because of other aspects of how those collections are used; people other
>>>>> than me would be in a better position to evaluate the full range of
>>>>> performance requirements for those collections.)
>>>>>
>>>>> Scheduler.cancel() also takes an alarming amount of time for what looks
>>>>> like a really simple method (
>>>>>
>>>>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-all/5.10.0/org/apache/activemq/thread/Scheduler.java#Scheduler.cancel%28java.lang.Runnable%29
>>>>> ).
>>>>>
>>>>> On Sun, Feb 22, 2015 at 7:56 PM, Kevin Burton <burton@spinn3r.com>
>>>>> wrote:
>>>>>
>>>>> > Here’s a jprofiler view with the advisory support enabled if you’re
>>>>> > curious.
>>>>> >
>>>>> > http://i.imgur.com/I1jesZz.jpg
>>>>> >
>>>>> > I’m not familiar with the internals of ActiveMQ enough to have
any
>>>>> obvious
>>>>> > optimization ideas.
>>>>> >
>>>>> > One other idea I had (which would require a ton of refactoring I
>>>>> think)
>>>>> > would be to potentially bulk delete all the queues at once.
>>>>> >
>>>>> >
>>>>> > On Sun, Feb 22, 2015 at 6:42 PM, Kevin Burton <burton@spinn3r.com>
>>>>> wrote:
>>>>> >
>>>>> > > And spending some more time in jprofiler, looks like 20% of
this is
>>>>> do to
>>>>> > > schedulerSupport and the other 80% of this is do to advisorySupport.
>>>>> > >
>>>>> > > if I set both to false the total runtime of my tests drops
in half…
>>>>> and
>>>>> > > the latencies fall from
>>>>> > >
>>>>> > > max create producer latency: 10,176 ms
>>>>> > >> max create message on existing producer and consumer: 2
ms
>>>>> > >
>>>>> > >
>>>>> > > … to
>>>>> > >
>>>>> > > max create producer latency: 1 ms
>>>>> > >> max create message on existing producer and consumer: 1
ms
>>>>> > >
>>>>> > >
>>>>> > > and this isn’t without fixing the purge background lock.
>>>>> > >
>>>>> > > So the question now is what the heck is the advisory support
doing
>>>>> that
>>>>> > > can result in such massive performance overhead.
>>>>> > >
>>>>> > > … and I think advisorySupport is enabled by default so that’s
>>>>> problematic
>>>>> > > as well.
>>>>> > >
>>>>> > >
>>>>> > > On Sun, Feb 22, 2015 at 4:45 PM, Kevin Burton <burton@spinn3r.com>
>>>>> > wrote:
>>>>> > >
>>>>> > >> OK.  Loaded up JProfiler and confirmed that it’s not
LevelDB.
>>>>> > >>
>>>>> > >> This is a non-persistent broker I’m testing on.
>>>>> > >>
>>>>> > >> Looks like it’s spending all it’s time in
>>>>> CopyOnWriteArrayList.remove
>>>>> > and
>>>>> > >> Timer.purge…
>>>>> > >>
>>>>> > >> Which is hopeful because this is ALL due to ActiveMQ internals
and
>>>>> in
>>>>> > >> theory LevelDB should perform well if we improve the performance
of
>>>>> > >> ActiveMQ internals and fix this lock bug.
>>>>> > >>
>>>>> > >> Which would rock!
>>>>> > >>
>>>>> > >> It should ALSO make queue creation faster.
>>>>> > >>
>>>>> > >>
>>>>> > >> On Sun, Feb 22, 2015 at 4:10 PM, Kevin Burton <burton@spinn3r.com>
>>>>> > wrote:
>>>>> > >>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> On Sun, Feb 22, 2015 at 3:30 PM, Tim Bain <tbain@alumni.duke.edu>
>>>>> > wrote:
>>>>> > >>>
>>>>> > >>>> So if LevelDB cleanup during removeDestination()
is the presumed
>>>>> > >>>> culprit,
>>>>> > >>>> can we spin off the LevelDB cleanup work into a
separate thread
>>>>> > >>>> (better: a
>>>>> > >>>> task object to be serviced by a ThreadPool so you
can avoid a
>>>>> fork
>>>>> > bomb
>>>>> > >>>> if
>>>>> > >>>> we remove many destinations at once) so the call
to
>>>>> > removeDestination()
>>>>> > >>>> can
>>>>> > >>>> return quickly and LevelDB can do its record-keeping
in the
>>>>> background
>>>>> > >>>> without blocking message-processing?
>>>>> > >>>>
>>>>> > >>>
>>>>> > >>> Would that be possible?  If the delete is pending on
ActiveMQ
>>>>> there is
>>>>> > a
>>>>> > >>> race where a producer could re-create it unless the
lock is held.
>>>>> > >>>
>>>>> > >>> Though I guess if you dispatched to the GC thread WITH
the lock
>>>>> still
>>>>> > >>> held you would be ok but I think if we use the existing
purge
>>>>> thread
>>>>> > then
>>>>> > >>> we’re fine.
>>>>> > >>>
>>>>> > >>> OK. I think I’m wrong about LevelDB being the issue.
 To be fair I
>>>>> > >>> wasn’t 100% certain before but I should have specified.
>>>>> > >>>
>>>>> > >>> Our current production broker is running with persistent=false..
>>>>> .and I
>>>>> > >>> just re-ran the tests without disk persistence and
it has the same
>>>>> > problem.
>>>>> > >>>
>>>>> > >>> So the main issue how is why the heck is ActiveMQ taking
SO LONG
>>>>> to GC
>>>>> > a
>>>>> > >>> queue.  It’s taking about 100ms which is an insane
amount of time
>>>>> > >>> considering this is done all in memory.
>>>>> > >>>
>>>>> > >>> Kevin
>>>>> > >>>
>>>>> > >>> --
>>>>> > >>>
>>>>> > >>> Founder/CEO Spinn3r.com
>>>>> > >>> Location: *San Francisco, CA*
>>>>> > >>> blog: http://burtonator.wordpress.com
>>>>> > >>> … or check out my Google+ profile
>>>>> > >>> <https://plus.google.com/102718274791889610666/posts>
>>>>> > >>> <http://spinn3r.com>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>
>>>>> > >>
>>>>> > >> --
>>>>> > >>
>>>>> > >> Founder/CEO Spinn3r.com
>>>>> > >> Location: *San Francisco, CA*
>>>>> > >> blog: http://burtonator.wordpress.com
>>>>> > >> … or check out my Google+ profile
>>>>> > >> <https://plus.google.com/102718274791889610666/posts>
>>>>> > >> <http://spinn3r.com>
>>>>> > >>
>>>>> > >>
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > >
>>>>> > > Founder/CEO Spinn3r.com
>>>>> > > Location: *San Francisco, CA*
>>>>> > > blog: http://burtonator.wordpress.com
>>>>> > > … or check out my Google+ profile
>>>>> > > <https://plus.google.com/102718274791889610666/posts>
>>>>> > > <http://spinn3r.com>
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>> > Founder/CEO Spinn3r.com
>>>>> > Location: *San Francisco, CA*
>>>>> > blog: http://burtonator.wordpress.com
>>>>> > … or check out my Google+ profile
>>>>> > <https://plus.google.com/102718274791889610666/posts>
>>>>> > <http://spinn3r.com>
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>> <http://spinn3r.com>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>>
>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>

Mime
View raw message