Mailing-List: contact users-help@activemq.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@activemq.apache.org
MIME-Version: 1.0
Sender: burtonator2011@gmail.com
In-Reply-To: 
 <CAAZU44kR94vPcJ_8y6mVQ-jO=3DahUVXq-bKvW8PZwNR-26KWA@mail.gmail.com>
References: 
 <CAAZU44n3WaH71E8mmaNTgnmU+mnoNtY1A=rv7KzsFR=yyhaJ7g@mail.gmail.com>
 <CAPVVMa8g9KKH5H6pjdrEjQ3VNio5H-PAiYwUH2rXdX_Sf_F2fA@mail.gmail.com>
 <CAAZU44kR94vPcJ_8y6mVQ-jO=3DahUVXq-bKvW8PZwNR-26KWA@mail.gmail.com>
From: Kevin Burton <burton@spinn3r.com>
Date: Mon, 8 Jun 2015 12:56:08 -0700
Message-ID: 
 <CAAZU44=r6JEwNSPbP0SOnOBU6SGKJpfiBW7avtEXmdH+nLLCJQ@mail.gmail.com>
Subject: Re: consumer prefetch considered harmful ..evict from the prefetch
 buffer?
To: users@activemq.apache.org
Content-Type: multipart/alternative; boundary=001a11c2fcba9aa5e50518070900

--001a11c2fcba9aa5e50518070900
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Looks like bumping up the number of used connections/thread DEFINITELY has
improved performance.  We=E2=80=99re now doing about 2x more throughput.

We went from about 20% of our threads used at any point in time to about
45% (higher is better).

Our task throughput is now 3x (higher is definitely better) and CPU usage
is from 20% to 60% (since it=E2=80=99s doing more work).

so I think this was definitely a choke point.

Maybe recommend users on higher end systems to use multiple connections so
that they get more threads.

I think we=E2=80=99re an edge case though ;)  We always seem to be!

On Mon, Jun 8, 2015 at 8:58 AM, Kevin Burton <burton@spinn3r.com> wrote:

>
>
>> I can see two potential problems that your description didn't draw a lin=
e
>> between:
>>
>>    1. With a large prefetch buffer, it's possible to have one thread hav=
e
>> a
>>    large number of prefetched tasks and another have none, even if all
>> tasks
>>    take an average amount of time to complete.  No thread is slow per se=
,
>> but
>>    because the messages were prefetched lopsidedly, one thread sits idle
>> while
>>    the other churns through what's on its plate.
>>
>
>
> Yes.  Totally.  I think I mentioned that but maybe didn=E2=80=99t spell i=
t out
> perfectly.
>
> This is one major edge case that needs to be addressed.
>
>
>>    2. With *any* prefetch buffer size, it's possible to have one message
>>    that takes forever to complete.  Any messages caught behind that one
>> slow
>>    message are stuck until it finishes.
>>
>>
> No.. that won=E2=80=99t happen because I have one consumer per thread so =
others
> are dispatched on the other consumers.  Even if prefetch is one.
>
> At least I have a test for this, and believe this to be the case and
> verified that my test works properly.
>
> But this you *may* be right if we=E2=80=99re just explaining it different=
ly.  I
> use one thread per consumer, so as long as there=E2=80=99s a message in p=
refetch,
> then I=E2=80=99m good.
>
> the problem is, I think, that my high CPU is stalling out ActiveMQ and so
> I can=E2=80=99t stay prefetched.
>
>
>>  Which scenario are you worried about here?
>>
>> If the latter, the AbortSlowAckConsumerStrategy (
>>
>> http://timbish.blogspot.com/2013/07/coming-in-activemq-59-new-way-to-abo=
rt.html
>> ;
>> sadly the wiki doesn't detail this strategy and Tim's personal blog post
>> is
>> the best documentation available) is intended to address exactly this:
>>
>
> Oh yes.. I think I remember reading this.
>
> yes.. the duplicate processing is far from ideal.
>
>
>> If the former, you're basically looking to enable work-stealing between
>> consumers, and I'm not aware of any existing capability to do that.  If
>> you
>> wanted to implement it, you'd probably want to implement it as a sibling
>> class to AbortSlowAckConsumerStrategy where SlowAck is the trigger but
>> StealWork is the action rather than Abort.
>
>
>
> Yes.. but I think you detailed the reason why it=E2=80=99s not ideal - it=
 requires
> a lot of work!
>
>
> I'm a little skeptical that your worker threads could so thoroughly smoth=
er
>> the CPU that the thread doing the prefetching gets starved out
>>
>
> you underestimate the power of the dark side=E2=80=A6 :)
>
> We=E2=80=99re very high CPU load=E2=80=A6 with something like 250-500 thr=
eads per daemon.
>
> We=E2=80=99re CPU oriented so if there=E2=80=99s work to be done, and we=
=E2=80=99re not at 100%
> CPU, then we=E2=80=99re wasting compute resources.
>
>
>> (particularly since I'd expect it to be primarily I/O-bound, so it's CPU
>> usage should be minimal), though I guess if you had as many worker threa=
ds
>> as cores you might be able to burn through all the prefetched messages
>> before the ActiveMQ thread gets rescheduled.  But I assume that your
>> workers are doing non-trivial amounts of work and are probably getting
>> context switched repeatedly during their processing, which I'd think wou=
ld
>> give the ActiveMQ thread plenty of time to do what it needs to.  Unless =
1)
>> you've set thread priorities to prioritize your workers over ActiveMQ, i=
n
>> which case don't do that,
>
>
>
> Yes.  My thread should be minimum priority.  I would verify that but
> there=E2=80=99s a Linux bug that causes jstack to show the wrong value.
>
> Does anyone know what priority the ActiveMQ transport thread runs under?
> The above bug prevents me from (easily) figuring that out.
>
>
>> 2) your worker threads are somehow holding onto a
>> lock that the ActiveMQ thread needs, which is possible but seems unlikel=
y,
>> or 3) you've set up so many consumers (far more than you have cores) tha=
t
>> the 1/(N+1)th that the ActiveMQ thread gets is too little or too
>> infrequent
>> to maintain responsiveness, in which case you need to scale back your
>> worker thread pool size (which I think means using fewer consumers per
>> process, based on what you've described).
>>
>
> Yes.  This is the case.  What I=E2=80=99m thinking to do is the reverse. =
 To use
> more connections and put then in a connection pull ..
>
> To right now if I have 1 connection, and 500 workers, then I have a 1:500
> ratio..  But if I bump that up to just 10=E2=80=A6 I=E2=80=99ll have 1:50=
.
>
> It think this is more realistic and would mean more CPU time to keep the
> prefetch buffer warm.  If I sized it as 1:1 (which would be wasteful of
> resources I think) then I think the problem would effectively be solved
> (but waste memory to solve it).
>
> But maybe around 1:10 or 1:20 it would be resolved.
>
> I need to verify that ActiveMQ allocates more thread per connection but
> I=E2=80=99m pretty sure it does.
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> =E2=80=A6 or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>


--=20

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
=E2=80=A6 or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

--001a11c2fcba9aa5e50518070900--