Mailing-List: contact users-help@qpid.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@qpid.apache.org
MIME-Version: 1.0
In-Reply-To: <CAHg71xJnxzuZMAPjTn7qZKWogQTvnetz2nYT53g4Pu3g4OX+jQ@mail.gmail.com>
References: <CAHg71xJpia_9dWzaR03Q6VycA5FdbZj+SEbYK1ranBRKc82PFw@mail.gmail.com>
 <CACsaS977yMkpspdCkuSSnr8C9G5OUGF78u6PYASWe0otEejSjg@mail.gmail.com>
 <CAHg71xLviZSJH65BDnnvm9ymipRosC+yaNF5q1R_K6PK6gwmbw@mail.gmail.com>
 <CACsaS97_YXKMXOyPa_QoZ9mJK0S20h3HcTF-2t6+vjiubP1ouw@mail.gmail.com>
 <CAHg71x+G2ymzE2p92dGMOLsoU+bEOs9_itjY0+o6JKCP840hAw@mail.gmail.com>
 <CACsaS949JHTi4=gHaQTqYWUm6vqhjFm8_WQXH4WNPYnwJSFC8A@mail.gmail.com>
 <CAHg71x+thFHeUT-f4UewuGG6pt-7QcsU3Vr3Z249=5MeAC+7yA@mail.gmail.com>
 <CACsaS9513jJ=CFYsHN14fCHALqw-EODTD3ZT8ziBM9LzVmDhXw@mail.gmail.com>
 <CACsaS978w7JvPeXuzYBghQrkO5+Bb+iuZgODR7p6QSbX6KA3Lw@mail.gmail.com> <CAHg71xJnxzuZMAPjTn7qZKWogQTvnetz2nYT53g4Pu3g4OX+jQ@mail.gmail.com>
From: Helen Kwong <helenkwong@gmail.com>
Date: Wed, 19 Oct 2016 16:49:54 -0700
Message-ID: <CAKeQ0gBTO3M2TEPFVb2u_zdgwVfCREaU7aydpzhFHwDy=-s_AQ@mail.gmail.com>
Subject: Re: Qpid broker 6.0.4 performance issues
To: Ramayan Tiwari <ramayan.tiwari@gmail.com>
Cc: users@qpid.apache.org
Content-Type: multipart/alternative; boundary=94eb2c07b02849b5dd053f40778f
archived-at: Wed, 19 Oct 2016 23:50:09 -0000

--94eb2c07b02849b5dd053f40778f
Content-Type: text/plain; charset=UTF-8

Hi Rob,

Again, thank you so much for answering our questions and providing a patch
so quickly :) One more question I have: would it be possible to include
test cases involving many queues and listeners (in the order of thousands
of queues) for future Qpid releases, as part of standard perf testing of
the broker?

Thanks,
Helen

On Tue, Oct 18, 2016 at 10:40 AM, Ramayan Tiwari <ramayan.tiwari@gmail.com>
wrote:

> Thanks so much Rob, I will test the patch against trunk and will update
> you with the outcome.
>
> - Ramayan
>
> On Tue, Oct 18, 2016 at 2:37 AM, Rob Godfrey <rob.j.godfrey@gmail.com>
> wrote:
>
>> On 17 October 2016 at 21:50, Rob Godfrey <rob.j.godfrey@gmail.com> wrote:
>>
>> >
>> >
>> > On 17 October 2016 at 21:24, Ramayan Tiwari <ramayan.tiwari@gmail.com>
>> > wrote:
>> >
>> >> Hi Rob,
>> >>
>> >> We are certainly interested in testing the "multi queue consumers"
>> >> behavior
>> >> with your patch in the new broker. We would like to know:
>> >>
>> >> 1. What will the scope of changes, client or broker or both? We are
>> >> currently running 0.16 client, so would like to make sure that we will
>> >> able
>> >> to use these changes with 0.16 client.
>> >>
>> >>
>> > There's no change to the client.  I can't remember what was in the 0.16
>> > client... the only issue would be if there are any bugs in the parsing
>> of
>> > address arguments.  I can try to test that out tmr.
>> >
>>
>>
>> OK - with a little bit of care to get round the address parsing issues in
>> the 0.16 client... I think we can get this to work.  I've created the
>> following JIRA:
>>
>> https://issues.apache.org/jira/browse/QPID-7462
>>
>> and attached to it are a patch which applies against trunk, and a separate
>> patch which applies against the 6.0.x branch (
>> https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x - this is 6.0.4
>> plus a few other fixes which we will soon be releasing as 6.0.5)
>>
>> To create a consumer which uses this feature (and multi queue consumption)
>> for the 0.16 client you need to use something like the following as the
>> address:
>>
>> queue_01 ; {node : { type : queue }, link : { x-subscribes : {
>> arguments : { x-multiqueue : [ queue_01, queue_02, queue_03 ],
>> x-pull-only : true }}}}
>>
>>
>> Note that the initial queue_01 has to be a name of an actual queue on
>> the virtual host, but otherwise it is not actually used (if you were
>> using a 0.32 or later client you could just use '' here).  The actual
>> queues that are consumed from are in the list value associated with
>> x-multiqueue.  For my testing I created a list with 3000 queues here
>> and this worked fine.
>>
>> Let me know if you have any questions / issues,
>>
>> Hope this helps,
>> Rob
>>
>>
>> >
>> >
>> >> 2. My understanding is that the "pull vs push" change is only with
>> respect
>> >> to broker and it does not change our architecture where we use
>> >> MessageListerner to receive messages asynchronously.
>> >>
>> >
>> > Exactly - this is only a change within the internal broker threading
>> > model.  The external behaviour of the broker remains essentially
>> unchanged.
>> >
>> >
>> >>
>> >> 3. Once I/O refactoring is completely, we would be able to go back to
>> use
>> >> standard JMS consumer (Destination), what is the timeline and broker
>> >> release version for the completion of this work?
>> >>
>> >
>> > You might wish to continue to use the "multi queue" model, depending on
>> > your actual use case, but yeah once the I/O work is complete I would
>> hope
>> > that you could use the thousands of consumers model should you wish.  We
>> > don't have a schedule for the next phase of I/O rework right now - about
>> > all I can say is that it is unlikely to be complete this year.  I'd
>> need to
>> > talk with Keith (who is currently on vacation) as to when we think we
>> may
>> > be able to schedule it.
>> >
>> >
>> >>
>> >> Let me know once you have integrated the patch and I will re-run our
>> >> performance tests to validate it.
>> >>
>> >>
>> > I'll make a patch for 6.0.x presently (I've been working on a change
>> > against trunk - the patch will probably have to change a bit to apply to
>> > 6.0.x).
>> >
>> > Cheers,
>> > Rob
>> >
>> > Thanks
>> >> Ramayan
>> >>
>> >> On Sun, Oct 16, 2016 at 3:30 PM, Rob Godfrey <rob.j.godfrey@gmail.com>
>> >> wrote:
>> >>
>> >> > OK - so having pondered / hacked around a bit this weekend, I think
>> to
>> >> get
>> >> > decent performance from the IO model in 6.0 for your use case we're
>> >> going
>> >> > to have to change things around a bit.
>> >> >
>> >> > Basically 6.0 is an intermediate step on our IO / threading model
>> >> journey.
>> >> > In earlier versions we used 2 threads per connection for IO (one
>> read,
>> >> one
>> >> > write) and then extra threads from a pool to "push" messages from
>> >> queues to
>> >> > connections.
>> >> >
>> >> > In 6.0 we move to using a pool for the IO threads, and also stopped
>> >> queues
>> >> > from "pushing" to connections while the IO threads were acting on the
>> >> > connection.  It's this latter fact which is screwing up performance
>> for
>> >> > your use case here because what happens is that on each network read
>> we
>> >> > tell each consumer to stop accepting pushes from the queue until the
>> IO
>> >> > interaction has completed.  This is causing lots of loops over your
>> 3000
>> >> > consumers on each session, which is eating up a lot of CPU on every
>> >> network
>> >> > interaction.
>> >> >
>> >> > In the final version of our IO refactoring we want to remove the
>> >> "pushing"
>> >> > from the queue, and instead have the consumers "pull" - so that the
>> only
>> >> > threads that operate on the queues (outside of housekeeping tasks
>> like
>> >> > expiry) will be the IO threads.
>> >> >
>> >> > So, what we could do (and I have a patch sitting on my laptop for
>> this)
>> >> is
>> >> > to look at using the "multi queue consumers" work I did for you guys
>> >> > before, but augmenting this so that the consumers work using a "pull"
>> >> model
>> >> > rather than the push model.  This will guarantee strict fairness
>> between
>> >> > the queues associated with the consumer (which was the issue you had
>> >> with
>> >> > this functionality before, I believe).  Using this model you'd only
>> >> need a
>> >> > small number (one?) of consumers per session.  The patch I have is to
>> >> add
>> >> > this "pull" mode for these consumers (essentially this is a preview
>> of
>> >> how
>> >> > all consumers will work in the future).
>> >> >
>> >> > Does this seem like something you would be interested in pursuing?
>> >> >
>> >> > Cheers,
>> >> > Rob
>> >> >
>> >> > On 15 October 2016 at 17:30, Ramayan Tiwari <
>> ramayan.tiwari@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Thanks Rob. Apologies for sending this over weekend :(
>> >> > >
>> >> > > Are there are docs on the new threading model? I found this on
>> >> > confluence:
>> >> > >
>> >> > > https://cwiki.apache.org/confluence/display/qpid/IO+
>> >> > Transport+Refactoring
>> >> > >
>> >> > > We are also interested in understanding the threading model a
>> little
>> >> > better
>> >> > > to help us figure our its impact for our usage patterns. Would be
>> very
>> >> > > helpful if there are more docs/JIRA/email-threads with some
>> details.
>> >> > >
>> >> > > Thanks
>> >> > >
>> >> > > On Sat, Oct 15, 2016 at 9:21 AM, Rob Godfrey <
>> rob.j.godfrey@gmail.com
>> >> >
>> >> > > wrote:
>> >> > >
>> >> > > > So I *think* this is an issue because of the extremely large
>> number
>> >> of
>> >> > > > consumers.  The threading model in v6 means that whenever a
>> network
>> >> > read
>> >> > > > occurs for a connection, it iterates over the consumers on that
>> >> > > connection
>> >> > > > - obviously where there are a large number of consumers this is
>> >> > > > burdensome.  I fear addressing this may not be a trivial
>> change...
>> >> I
>> >> > > shall
>> >> > > > spend the rest of my afternoon pondering this...
>> >> > > >
>> >> > > > - Rob
>> >> > > >
>> >> > > > On 15 October 2016 at 17:14, Ramayan Tiwari <
>> >> ramayan.tiwari@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Hi Rob,
>> >> > > > >
>> >> > > > > Thanks so much for your response. We use transacted sessions
>> with
>> >> > > > > non-persistent delivery. Prefetch size is 1 and every message
>> is
>> >> same
>> >> > > > size
>> >> > > > > (200 bytes).
>> >> > > > >
>> >> > > > > Thanks
>> >> > > > > Ramayan
>> >> > > > >
>> >> > > > > On Sat, Oct 15, 2016 at 2:59 AM, Rob Godfrey <
>> >> > rob.j.godfrey@gmail.com>
>> >> > > > > wrote:
>> >> > > > >
>> >> > > > > > Hi Ramyan,
>> >> > > > > >
>> >> > > > > > this is interesting... in our testing (which admittedly
>> didn't
>> >> > cover
>> >> > > > the
>> >> > > > > > case of this many queues / listeners) we saw the 6.0.x broker
>> >> using
>> >> > > > less
>> >> > > > > > CPU on average than the 0.32 broker.  I'll have a look this
>> >> weekend
>> >> > > as
>> >> > > > to
>> >> > > > > > why creating the listeners is slower.  On the dequeing, can
>> you
>> >> > give
>> >> > > a
>> >> > > > > > little more information on the usage pattern - are you using
>> >> > > > > transactions,
>> >> > > > > > auto-ack or client ack?  What prefetch size are you using?
>> How
>> >> > large
>> >> > > > are
>> >> > > > > > your messages?
>> >> > > > > >
>> >> > > > > > Thanks,
>> >> > > > > > Rob
>> >> > > > > >
>> >> > > > > > On 14 October 2016 at 23:46, Ramayan Tiwari <
>> >> > > ramayan.tiwari@gmail.com>
>> >> > > > > > wrote:
>> >> > > > > >
>> >> > > > > > > Hi All,
>> >> > > > > > >
>> >> > > > > > > We have been validating the new Qpid broker (version 6.0.4)
>> >> and
>> >> > > have
>> >> > > > > > > compared against broker version 0.32 and are seeing major
>> >> > > > regressions.
>> >> > > > > > > Following is the summary of our test setup and results:
>> >> > > > > > >
>> >> > > > > > > *1. Test Setup *
>> >> > > > > > >   *a). *Qpid broker runs on a dedicated host (12 cores, 32
>> GB
>> >> > RAM).
>> >> > > > > > >   *b).* For 0.32, we allocated 16 GB heap. For 6.0.6
>> broker,
>> >> we
>> >> > use
>> >> > > > 8GB
>> >> > > > > > > heap and 8GB direct memory.
>> >> > > > > > >   *c).* For 6.0.4, flow to disk has been configured at 60%.
>> >> > > > > > >   *d).* Both the brokers use BDB host type.
>> >> > > > > > >   *e).* Brokers have around 6000 queues and we create 16
>> >> listener
>> >> > > > > > > sessions/threads spread over 3 connections, where each
>> >> session is
>> >> > > > > > listening
>> >> > > > > > > to 3000 queues. However, messages are only enqueued and
>> >> processed
>> >> > > > from
>> >> > > > > 10
>> >> > > > > > > queues.
>> >> > > > > > >   *f).* We enqueue 1 million messages across 10 different
>> >> queues
>> >> > > > > (evenly
>> >> > > > > > > divided), at the start of the test. Dequeue only starts
>> once
>> >> all
>> >> > > the
>> >> > > > > > > messages have been enqueued. We run the test for 2 hours
>> and
>> >> > > process
>> >> > > > as
>> >> > > > > > > many messages as we can. Each message runs for around 200
>> >> > > > milliseconds.
>> >> > > > > > >   *g).* We have used both 0.16 and 6.0.4 clients for these
>> >> tests
>> >> > > > (6.0.4
>> >> > > > > > > client only with 6.0.4 broker)
>> >> > > > > > >
>> >> > > > > > > *2. Test Results *
>> >> > > > > > >   *a).* System Load Average (read notes below on how we
>> >> compute
>> >> > > it),
>> >> > > > > for
>> >> > > > > > > 6.0.4 broker is 5x compared to 0.32 broker. During start of
>> >> the
>> >> > > test
>> >> > > > > > (when
>> >> > > > > > > we are not doing any dequeue), load average is normal (0.05
>> >> for
>> >> > > 0.32
>> >> > > > > > broker
>> >> > > > > > > and 0.1 for new broker), however, while we are dequeuing
>> >> > messages,
>> >> > > > the
>> >> > > > > > load
>> >> > > > > > > average is very high (around 0.5 consistently).
>> >> > > > > > >
>> >> > > > > > >   *b). *Time to create listeners in new broker has gone up
>> by
>> >> > 220%
>> >> > > > > > compared
>> >> > > > > > > to 0.32 broker (when using 0.16 client). For old broker,
>> >> creating
>> >> > > 16
>> >> > > > > > > sessions each listening to 3000 queues takes 142 seconds
>> and
>> >> in
>> >> > new
>> >> > > > > > broker
>> >> > > > > > > it took 456 seconds. If we use 6.0.4 client, it took even
>> >> longer
>> >> > at
>> >> > > > > 524%
>> >> > > > > > > increase (887 seconds).
>> >> > > > > > >      *I).* The time to create consumers increases as we
>> create
>> >> > more
>> >> > > > > > > listeners on the same connections. We have 20 sessions (but
>> >> end
>> >> > up
>> >> > > > > using
>> >> > > > > > > around 5 of them) on each connection and we create about
>> 3000
>> >> > > > consumers
>> >> > > > > > and
>> >> > > > > > > attach MessageListener to it. Each successive session takes
>> >> > longer
>> >> > > > > > > (approximately linear increase) to setup same number of
>> >> consumers
>> >> > > and
>> >> > > > > > > listeners.
>> >> > > > > > >
>> >> > > > > > > *3). How we compute System Load Average *
>> >> > > > > > > We query the Mbean SysetmLoadAverage and divide it by the
>> >> value
>> >> > of
>> >> > > > > MBean
>> >> > > > > > > AvailableProcessors. Both of these MBeans are available
>> under
>> >> > > > > > > java.lang.OperatingSystem.
>> >> > > > > > >
>> >> > > > > > > I am not sure what is causing these regressions and would
>> like
>> >> > your
>> >> > > > > help
>> >> > > > > > in
>> >> > > > > > > understanding it. We are aware about the changes with
>> respect
>> >> to
>> >> > > > > > threading
>> >> > > > > > > model in the new broker, are there any design docs that we
>> can
>> >> > > refer
>> >> > > > to
>> >> > > > > > > understand these changes at a high level? Can we tune some
>> >> > > parameters
>> >> > > > > to
>> >> > > > > > > address these issues?
>> >> > > > > > >
>> >> > > > > > > Thanks
>> >> > > > > > > Ramayan
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

--94eb2c07b02849b5dd053f40778f--