activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Bish <tabish...@gmail.com>
Subject Re: A proposal to rewrite purgeInactiveDestinations locking to prevent queue GC lockups.
Date Tue, 24 Feb 2015 18:54:59 GMT
On 02/24/2015 01:22 PM, Kevin Burton wrote:
> oh  geez… seriously though.  can you give me an estimate?
>
> Assuming 30 seconds each I’m assuming about 11 hours.

That's probably not to far off, usually just let them run over night for 
the all profile.

>
> http://activemq.apache.org/junit-reports.html
> https://hudson.apache.org/hudson/job/ActiveMQ/
> Hudson is down for ActiveMQ.
>
> Also, is there a disk space requirement for the tests? I have 10GB free or
> so and I’m getting TONS of these:
I don't think anyone has looked into the min required free for the tests 
to run.

>
>
> java.lang.RuntimeException: Failed to start provided job scheduler store:
> JobSchedulerStore:activemq-data
>
> at java.io.RandomAccessFile.seek(Native Method)
>
> at
> org.apache.activemq.util.RecoverableRandomAccessFile.seek(RecoverableRandomAccessFile.java:384)
>
> at
> org.apache.activemq.store.kahadb.disk.page.PageFile.readPage(PageFile.java:877)
>
> at
> org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:427)
>
> at
> org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:377)
>
> at
> org.apache.activemq.store.kahadb.disk.index.BTreeIndex.load(BTreeIndex.java:159)
>
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl$MetaData.load(JobSchedulerStoreImpl.java:90)
>
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl$3.execute(JobSchedulerStoreImpl.java:277)
>
> at
> org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)
>
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.doStart(JobSchedulerStoreImpl.java:261)
>
> at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
>
> at
> org.apache.activemq.broker.BrokerService.setJobSchedulerStore(BrokerService.java:1891)
>
> at
> org.apache.activemq.transport.stomp.StompTestSupport.createBroker(StompTestSupport.java:174)
>
> at
> org.apache.activemq.transport.stomp.StompTestSupport.startBroker(StompTestSupport.java:112)
>
> at
> org.apache.activemq.transport.stomp.StompTestSupport.setUp(StompTestSupport.java:94)
>
> at
> org.apache.activemq.transport.stomp.Stomp12Test.setUp(Stomp12Test.java:41)
>
> at
> org.apache.activemq.transport.stomp.Stomp12SslAuthTest.setUp(Stomp12SslAuthTest.java:38)
>
> On Tue, Feb 24, 2015 at 10:09 AM, Timothy Bish <tabish121@gmail.com> wrote:
>
>> On 02/24/2015 01:01 PM, Kevin Burton wrote:
>>
>>> How long do the tests usually take?
>>>
>> An eternity + 1hr
>>
>>   I’m looking at 45 minutes right now
>>> before I gave up… I think part of it was that the box I was testing on was
>>> virtualized and didn’t have enough resources.
>>>
>>> I tried to parallelize the tests (-T 8 with maven) but I got other errors
>>> so I assume the ports are singletons.
>>>
>> They won't be happy if you run them parallel.
>>
>>
>>
>>> On Tue, Feb 24, 2015 at 8:03 AM, Gary Tully <gary.tully@gmail.com> wrote:
>>>
>>>   if there are any test failures - try to run them individually
>>>> -Dtest=a,b etc. There may be an issue with a full test run, but all of
>>>> the tests that are enabled should work. I know there are some issues
>>>> with jdbc tests that hang or fail due to previous runs no cleaning up,
>>>> but that should be the most of it. I got a bunch of full test runs
>>>> before the 5.11 release if that is any help.
>>>>
>>>> On 23 February 2015 at 20:38, Kevin Burton <burton@spinn3r.com> wrote:
>>>>
>>>>> OK.  This is ready to go and I have a patch branch:
>>>>>
>>>>> https://issues.apache.org/jira/browse/AMQ-5609
>>>>>
>>>>> I’m stuck at the moment though because tests don’t pass.  But it
was
>>>>> failing tests before so I don’t think it has anything to do with my
>>>>> changes.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 22, 2015 at 11:11 PM, Kevin Burton <burton@spinn3r.com>
>>>>>
>>>> wrote:
>>>>
>>>>> Actually, is the lock even needed here?  Why would it be?  if we’re
>>>>>> *removing* a subscription, why does it care if we possibly ALSO remove
>>>>>> a
>>>>>> separate / isolated queue before/after the subscription is removed.
>>>>>>
>>>>>> I think this is redundant and can be removed.  Maybe I’m wrong
though.
>>>>>>
>>>>>> I looked at all the callers and none were associated with queues.
>>>>>>
>>>>>> On Sun, Feb 22, 2015 at 11:07 PM, Kevin Burton <burton@spinn3r.com>
>>>>>>
>>>>> wrote:
>>>>> So I have some working/theoretical code that should resolve this.
>>>>>>> It acquires a lock *per* ActiveMQDestination, this way there
is no
>>>>>>> lock
>>>>>>> contention.
>>>>>>>
>>>>>>> But here’s where I’m stuck.
>>>>>>>
>>>>>>>       @Override
>>>>>>>
>>>>>>>>       public void removeSubscription(ConnectionContext context,
>>>>>>>> RemoveSubscriptionInfo info) throws Exception {
>>>>>>>>           inactiveDestinationsPurgeLock.readLock().lock();
>>>>>>>>           try {
>>>>>>>>               topicRegion.removeSubscription(context, info);
>>>>>>>>           } finally {
>>>>>>>>               inactiveDestinationsPurgeLock.readLock().unlock();
>>>>>>>>           }
>>>>>>>>       }
>>>>>>>>
>>>>>>> .. this is in RegionBroker.
>>>>>>>
>>>>>>> There is no ActiveMQDestination involved here so I’m not sure
the best
>>>>>>> way to resolve this.
>>>>>>>
>>>>>>> Any advice?
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 22, 2015 at 8:11 PM, Kevin Burton <burton@spinn3r.com>
>>>>>>>
>>>>>> wrote:
>>>>> Yes.  That was my thinking too.. that just replacing the
>>>>>>> CopyOnWriteArraySet
>>>>> with something more performance would solve the issue.
>>>>>>>> This would also improve queue creation time as well as queue
deletion
>>>>>>>> time.
>>>>>>>>
>>>>>>>> What I think I”m going to do in the mean time is:
>>>>>>>>
>>>>>>>> - implement a granular lock based on queue name… I am going
to use an
>>>>>>>> interface so we can replace the implementation later.
>>>>>>>>
>>>>>>>> - implement timing for the purge thread so it tracks how
long it
>>>>>>>> takes
>>>>>>>> to remove a queue but also how long the entire loop takes.
>>>>>>>>
>>>>>>>> I’ll do this on a branch so it should be easy to merge.
>>>>>>>>
>>>>>>>> On Sun, Feb 22, 2015 at 7:40 PM, Tim Bain <tbain@alumni.duke.edu>
>>>>>>>>
>>>>>>> wrote:
>>>>> A decent amount of the time is being spent calling remove() on
>>>>>>>> various
>>>>> array-backed collections.  Those data structures might be
>>>>>>>> inappropriate
>>>>> for
>>>>>>>>> the number of destinations you're running, since array-backed
>>>>>>>>> collections
>>>>>>>>> tend to have add/remove operations that are O(N); some
improvement
>>>>>>>>>
>>>>>>>> might
>>>>> come from something as simple as moving to a ConcurrentHashSet
>>>>>>>> instead
>>>>> of a
>>>>>>>>> CopyOnWriteArraySet, for example.  (Or it might make
performance
>>>>>>>>>
>>>>>>>> worse
>>>>> because of other aspects of how those collections are used; people
>>>>>>>> other
>>>>> than me would be in a better position to evaluate the full range of
>>>>>>>>> performance requirements for those collections.)
>>>>>>>>>
>>>>>>>>> Scheduler.cancel() also takes an alarming amount of time
for what
>>>>>>>>>
>>>>>>>> looks
>>>>> like a really simple method (
>>>>>>>>>
>>>>>>>>>   http://grepcode.com/file/repo1.maven.org/maven2/org.
>>>> apache.activemq/activemq-all/5.10.0/org/apache/activemq/
>>>> thread/Scheduler.java#Scheduler.cancel%28java.lang.Runnable%29
>>>>
>>>>> ).
>>>>>>>>> On Sun, Feb 22, 2015 at 7:56 PM, Kevin Burton <burton@spinn3r.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>   Here’s a jprofiler view with the advisory support
enabled if you’re
>>>>>>>>>> curious.
>>>>>>>>>>
>>>>>>>>>> http://i.imgur.com/I1jesZz.jpg
>>>>>>>>>>
>>>>>>>>>> I’m not familiar with the internals of ActiveMQ
enough to have any
>>>>>>>>>>
>>>>>>>>> obvious
>>>>>>>>>
>>>>>>>>>> optimization ideas.
>>>>>>>>>>
>>>>>>>>>> One other idea I had (which would require a ton of
refactoring I
>>>>>>>>>>
>>>>>>>>> think)
>>>>>>>>>
>>>>>>>>>> would be to potentially bulk delete all the queues
at once.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Feb 22, 2015 at 6:42 PM, Kevin Burton <burton@spinn3r.com>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> And spending some more time in jprofiler, looks like
20% of this
>>>>>>>>>> is
>>>>> do to
>>>>>>>>>> schedulerSupport and the other 80% of this is do
to
>>>>>>>>>> advisorySupport.
>>>>> if I set both to false the total runtime of my tests drops in
>>>>>>>>>> half…
>>>>> and
>>>>>>>>>> the latencies fall from
>>>>>>>>>>> max create producer latency: 10,176 ms
>>>>>>>>>>>
>>>>>>>>>>>> max create message on existing producer and
consumer: 2 ms
>>>>>>>>>>>>
>>>>>>>>>>> … to
>>>>>>>>>>>
>>>>>>>>>>> max create producer latency: 1 ms
>>>>>>>>>>>
>>>>>>>>>>>> max create message on existing producer and
consumer: 1 ms
>>>>>>>>>>>>
>>>>>>>>>>> and this isn’t without fixing the purge background
lock.
>>>>>>>>>>>
>>>>>>>>>>> So the question now is what the heck is the advisory
support
>>>>>>>>>>>
>>>>>>>>>> doing
>>>>> that
>>>>>>>>>> can result in such massive performance overhead.
>>>>>>>>>>> … and I think advisorySupport is enabled by
default so that’s
>>>>>>>>>>>
>>>>>>>>>> problematic
>>>>>>>>>> as well.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Feb 22, 2015 at 4:45 PM, Kevin Burton
<
>>>>>>>>>>>
>>>>>>>>>> burton@spinn3r.com>
>>>>> wrote:
>>>>>>>>>>> OK.  Loaded up JProfiler and confirmed that it’s
not LevelDB.
>>>>>>>>>>>> This is a non-persistent broker I’m testing
on.
>>>>>>>>>>>>
>>>>>>>>>>>> Looks like it’s spending all it’s time
in
>>>>>>>>>>>>
>>>>>>>>>>> CopyOnWriteArrayList.remove
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>>> Timer.purge…
>>>>>>>>>>>> Which is hopeful because this is ALL due
to ActiveMQ internals
>>>>>>>>>>>>
>>>>>>>>>>> and
>>>>> in
>>>>>>>>>> theory LevelDB should perform well if we improve
the
>>>>>>>>>>> performance of
>>>>> ActiveMQ internals and fix this lock bug.
>>>>>>>>>>>> Which would rock!
>>>>>>>>>>>>
>>>>>>>>>>>> It should ALSO make queue creation faster.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Feb 22, 2015 at 4:10 PM, Kevin Burton
<
>>>>>>>>>>>>
>>>>>>>>>>> burton@spinn3r.com>
>>>>> wrote:
>>>>>>>>>>>>> On Sun, Feb 22, 2015 at 3:30 PM, Tim
Bain <
>>>>>>>>>>>>>
>>>>>>>>>>>> tbain@alumni.duke.edu>
>>>>> wrote:
>>>>>>>>>>> So if LevelDB cleanup during removeDestination()
is the
>>>>>>>>>>>>> presumed
>>>>> culprit,
>>>>>>>>>>>>>> can we spin off the LevelDB cleanup
work into a separate
>>>>>>>>>>>>>>
>>>>>>>>>>>>> thread
>>>>> (better: a
>>>>>>>>>>>>>> task object to be serviced by a ThreadPool
so you can avoid a
>>>>>>>>>>>>>>
>>>>>>>>>>>>> fork
>>>>>>>>>> bomb
>>>>>>>>>>
>>>>>>>>>>> if
>>>>>>>>>>>>>> we remove many destinations at once)
so the call to
>>>>>>>>>>>>>>
>>>>>>>>>>>>> removeDestination()
>>>>>>>>>>> can
>>>>>>>>>>>>>> return quickly and LevelDB can do
its record-keeping in the
>>>>>>>>>>>>>>
>>>>>>>>>>>>> background
>>>>>>>>>> without blocking message-processing?
>>>>>>>>>>>>>>   Would that be possible?  If the
delete is pending on ActiveMQ
>>>>>>>>>>>> there is
>>>>>>>>>> a
>>>>>>>>>>
>>>>>>>>>>> race where a producer could re-create it unless
the lock is
>>>>>>>>>>>> held.
>>>>> Though I guess if you dispatched to the GC thread WITH the lock
>>>>>>>>>>>> still
>>>>>>>>>> held you would be ok but I think if we use the existing
purge
>>>>>>>>>>>> thread
>>>>>>>>>> then
>>>>>>>>>>
>>>>>>>>>>> we’re fine.
>>>>>>>>>>>>> OK. I think I’m wrong about LevelDB
being the issue.  To be
>>>>>>>>>>>>>
>>>>>>>>>>>> fair I
>>>>> wasn’t 100% certain before but I should have specified.
>>>>>>>>>>>>> Our current production broker is running
with
>>>>>>>>>>>>>
>>>>>>>>>>>> persistent=false..
>>>>> .and I
>>>>>>>>>> just re-ran the tests without disk persistence and
it has the
>>>>>>>>>>>> same
>>>>> problem.
>>>>>>>>>>> So the main issue how is why the heck is ActiveMQ
taking SO
>>>>>>>>>>>> LONG
>>>>> to GC
>>>>>>>>>> a
>>>>>>>>>>
>>>>>>>>>>> queue.  It’s taking about 100ms which is an
insane amount of
>>>>>>>>>>>> time
>>>>> considering this is done all in memory.
>>>>>>>>>>>>> Kevin
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Founder/CEO Spinn3r.com
>>>>>>>>>>>>> Location: *San Francisco, CA*
>>>>>>>>>>>>> blog: http://burtonator.wordpress.com
>>>>>>>>>>>>> … or check out my Google+ profile
>>>>>>>>>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>>>>>>>>> <http://spinn3r.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Founder/CEO Spinn3r.com
>>>>>>>>>>>> Location: *San Francisco, CA*
>>>>>>>>>>>> blog: http://burtonator.wordpress.com
>>>>>>>>>>>> … or check out my Google+ profile
>>>>>>>>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>>>>>>>> <http://spinn3r.com>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Founder/CEO Spinn3r.com
>>>>>>>>>>> Location: *San Francisco, CA*
>>>>>>>>>>> blog: http://burtonator.wordpress.com
>>>>>>>>>>> … or check out my Google+ profile
>>>>>>>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>>>>>>> <http://spinn3r.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Founder/CEO Spinn3r.com
>>>>>>>>>> Location: *San Francisco, CA*
>>>>>>>>>> blog: http://burtonator.wordpress.com
>>>>>>>>>> … or check out my Google+ profile
>>>>>>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>>>>>> <http://spinn3r.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Founder/CEO Spinn3r.com
>>>>>>>> Location: *San Francisco, CA*
>>>>>>>> blog: http://burtonator.wordpress.com
>>>>>>>> … or check out my Google+ profile
>>>>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>>>> <http://spinn3r.com>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Founder/CEO Spinn3r.com
>>>>>>> Location: *San Francisco, CA*
>>>>>>> blog: http://burtonator.wordpress.com
>>>>>>> … or check out my Google+ profile
>>>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>>> <http://spinn3r.com>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> Founder/CEO Spinn3r.com
>>>>>> Location: *San Francisco, CA*
>>>>>> blog: http://burtonator.wordpress.com
>>>>>> … or check out my Google+ profile
>>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>> <http://spinn3r.com>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>>
>>>>> Founder/CEO Spinn3r.com
>>>>> Location: *San Francisco, CA*
>>>>> blog: http://burtonator.wordpress.com
>>>>> … or check out my Google+ profile
>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>> <http://spinn3r.com>
>>>>>
>>>
>> --
>> Tim Bish
>> Sr Software Engineer | RedHat Inc.
>> tim.bish@redhat.com | www.redhat.com
>> skype: tabish121 | twitter: @tabish121
>> blog: http://timbish.blogspot.com/
>>
>>
>


-- 
Tim Bish
Sr Software Engineer | RedHat Inc.
tim.bish@redhat.com | www.redhat.com
skype: tabish121 | twitter: @tabish121
blog: http://timbish.blogspot.com/


Mime
View raw message