qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CLIVE <cl...@ckjltd.co.uk>
Subject Re: QPID performance on virtual machines
Date Fri, 04 May 2012 16:41:17 GMT
Carl,

I ran a test today on the Dell R710 physical machine using qpidd linked 
against Google's tcmalloc (exported 
LD_PRELOAD=/home/clive/libs/libtcmalloc_minimal.so before running the 
qpidd process).

When qpid-perftest was executed using it's default values, I saw the 
publish and consume rates raise from 85000/80000 to 108000/105000 
transfers/sec. A significant increase.

Producing QPID's own thread optimized malloc or incorporating an 
existing third party version into the build might have some merit.

Anyway thought you might like to know.

As an aside I hope to try Intel's TBB next, so will keep you informed on 
how this performs

Clive

On 03/05/2012 22:56, Carl Trieloff wrote:
>
> I was chatting to Kim about this, this week and I believe we should do
> something along these lines (custom memory allocator) for quite a few
> reasons.
>
> Carl.
>
>
> On 05/03/2012 05:42 PM, CLIVE wrote:
>> Steve,
>>
>> Just one other thought. On other multi-threaded applications I have
>> usually found a significant speed up by moving to a more thread
>> efficient memory allocator like that provided by Intel's Thread
>> Building Blocks (TBB) or Google's tcmalloc (part of google-perftools)
>>
>> Is this something that you think might be worth a look, or is QPID
>> doing something clever already?
>>
>> Clive
>>
>> On 03/05/2012 22:04, Steve Huston wrote:
>>> Ok, Clive - thanks very much for the follow-up! Glad you have this
>>> situation in hand now.
>>>
>>> -Steve
>>>
>>>> -----Original Message-----
>>>> From: CLIVE [mailto:clive@ckjltd.co.uk]
>>>> Sent: Thursday, May 03, 2012 4:53 PM
>>>> To: Steve Huston
>>>> Cc: users@qpid.apache.org; 'James Kirkland'
>>>> Subject: Re: QPID performance on virtual machines
>>>>
>>>> Steve,
>>>>
>>>> Managed to run some more performance tests today using a RHEL5u4 VM on
>>>> a Dell R710 . Ran qpid-perftest with default values on the same VM as
>>> qpidd,
>>>> each test ran several times with the calculated average shown in the
>>> table
>>>> below.
>>>>
>>>> CPUs    RAM      Publish    Consume
>>>>      2         4G        48K           46K
>>>>      4         4G        65K           60K
>>>>      6         4G        73K           66K
>>>>      2         8G        46K           44K
>>>>      4         8G        65K           61K
>>>>      6         8G        74K           67K
>>>>
>>>> Basically it confirms your assertion about the broker using more
>>>> threads
>>>> under heavy load. Changing the VM memory had no discernible effect on
>>>> performance, but increasing the number of CPU's available to the VM had
>>> a
>>>> big effect on throughput.
>>>>
>>>> So when defining a VM for QPID transient usage focus on CPU
>>> allocation!!!
>>>> Thanks for the advise and help.
>>>>
>>>> Clive
>>>>
>>>>
>>>>
>>>> On 03/05/2012 15:27, Steve Huston wrote:
>>>>> Hi Clive,
>>>>>
>>>>> The broker will use threads based on load - if the broker takes longer
>>>>> to process a message than qpid-perftest takes to send the next
>>>>> message, the broker would need more threads.
>>>>>
>>>>> A more pointed test for broker performance would be to run the client
>>>>> on another host - then you know the non-VM vs. VM differences are just
>>>>> the broker's actions. It may be a little confusing weeding out the
>>> actual vs.
>>>>> virtual NIC issues, but there would be no confusion about how much the
>>>>> client is taking away from resources available to the broker.
>>>>>
>>>>> -Steve
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: CLIVE [mailto:clive@ckjltd.co.uk]
>>>>>> Sent: Wednesday, May 02, 2012 5:28 PM
>>>>>> To: users@qpid.apache.org
>>>>>> Cc: Steve Huston; 'James Kirkland'
>>>>>> Subject: Re: QPID performance on virtual machines
>>>>>>
>>>>>> Steve,
>>>>>>
>>>>>> I thought about this as well. So re-started the broker on the
>>>>>> physical
>>>>> Dell
>>>>>> R710 with the threads option set to just 4 and saw the same
>>>>>> throughput values (85000 publish and 80000 subscribe). As reducing
>>>>>> the threads
>>>>> count
>>>>>> didn't seem to have much effect on the physical machine I thought
>>>>>> that
>>>>> this
>>>>>> probably wasn't the issue.
>>>>>>
>>>>>> As the qpid-perftest application was only creating 1 producer and
1
>>>>> consumer
>>>>>> I reasoned that perhaps the broker was only using two threads too
>>>>> service
>>>>>> the read and writes from these clients. This was why reducing the
>>>>>> thread count on the broker had no effect. Would you expect the broker
>>>>>> to use
>>>>> more
>>>>>> than two threads to service the clients for this scenario?
>>>>>>
>>>>>> I will rerun the test tomorrow based on an increased number of CPU's
>>>>>> in
>>>>> the
>>>>>> VM(s) just to double check whether it is a number of cores issue.
>>>>>>
>>>>>> I did run 'strace -c' on qpidd while the test was running to count
>>>>>> the
>>>>> number
>>>>>> of system calls and I noted the big hitters were futex and write.
>>>>>> Interestingly the reads read in 64K chunks, but the writes were only
>>>>>> 2048 bytes at a time. As a result the number writes occurring were
an
>>>>> order
>>>>>> of magnitude bigger than the reads; I left the detailed results at
>>>>>> work
>>>>> so
>>>>>> apologies for not quoting the actual figures.
>>>>>>
>>>>>> Clive
>>>>>>
>>>>>> On 02/05/2012 20:23, Steve Huston wrote:
>>>>>>> The qpid broker learns how many CPUs are available and will run
more
>>>>>>> I/O threads when more CPUs are available (#CPUs + 1 threads).
It
>>>>>>> would be interesting to see the results if your VM gets more
CPUs.
>>>>>>>
>>>>>>> -Steve
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: CLIVE [mailto:clive@ckjltd.co.uk]
>>>>>>>> Sent: Wednesday, May 02, 2012 1:30 PM
>>>>>>>> To: James Kirkland
>>>>>>>> Cc: users@qpid.apache.org
>>>>>>>> Subject: Re: QPID performance on virtual machines
>>>>>>>>
>>>>>>>> James,
>>>>>>>>
>>>>>>>> qpid-perf-test (as supplied with the qpid-0.14 source tar
ball)
>>>>>>>> runs a
>>>>>>> direct
>>>>>>>> queue test when executed without any parameters; there is
a
>>>> command
>>>>>>>> line option that enables this to be be changed if required.
 The
>>>>>>>> message size
>>>>>>> is
>>>>>>>> 1024K (again default size when not explicitly set). And
>>>>>>>> 500000 messages are published by the test (again the default
when
>>>>>>>> not explicitly set). All messages are transient so I wouldn't
>>>>>>>> expect any
>>>>>>> file I/O
>>>>>>>> overhead to interfere with the test and this is confirmed
by the
>>>>>>>> vmstat results I am seeing. The only jump in the vmstat output
is
>>>>>>>> the number of context switches that are occurring which jumps
up
>>>>>>>> into the
>>>>>> thousands.
>>>>>>>> Clive
>>>>>>>>
>>>>>>>> On 02/05/2012 18:10, James Kirkland wrote:
>>>>>>>>> What sort of messging scenario is it?  Are the messages
persisted?
>>>>>>>>> How big are they?  If they are persisted are you using
virtual
>>>>>>>>> disks or physical devices?
>>>>>>>>>
>>>>>>>>> CLIVE wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have been undertaking some performance profiling
of QPID
>>>>>>>>>> version
>>>>>>>>>> 0.14 over the last few weeks and I have found a significant
>>>>>>>>>> performance drop off when running QPID in a virtual
machine.
>>>>>>>>>>
>>>>>>>>>> As an example if I run qpidd on an 8 core DELL R710
with 36G RAM
>>>>>>>>>> (RHEL5u5) and then run qpid-perf-test (on the same
machine to
>>>>>>>>>> discount any network problems) without any command
line
>>>>>> parameters
>>>>>>>>>> I am seeing about 85,000 publish transfers/sec and
80000 consume
>>>>>>>>>> transfers/sec. If I run the same scenario on a VM
(tried both KVM
>>>>>>>>>> and VMWare ESXi 4.3 running RHEL5u5) with 2 cores
and 8G RAM, I
>>>>>>>>>> am seeing only 45000 publish transfers/sec and 40000
consume
>>>>>>>>>> transfers/sec. A significant drop off in performance.
Looking at
>>>>>>>>>> the cpu and memory usage these would not seem to
be the limiting
>>>>>>>>>> factors as the memory consumption of qpidd stays
under 200
>>>> MBytes
>>>>>>>>>> and its CPU is up at about 150%; hence the two core
machine.
>>>>>>>>>>
>>>>>>>>>> I have even run the same test on my Mac Book at home
using
>>>> VMWare
>>>>>>>>>> Fusion 4 ( 2 Core 4G RAM) and see the same 45000/40000
>>>>>>>>>> transfers/sec results.
>>>>>>>>>>
>>>>>>>>>> I would expect a small drop off in performance when
running in a
>>>>>>>>>> VM, but not to the extent that I am seeing.
>>>>>>>>>>
>>>>>>>>>> Has anyone else seen this and if so were they able
to get to the
>>>>>>>>>> bottom of the issue.
>>>>>>>>>>
>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Clive Lilley
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> James Kirkland
>>>>>>>>> Principal Enterprise Solutions Architect
>>>>>>>>> 3340 Peachtree Road, NE,
>>>>>>>>> Suite 1200
>>>>>>>>> Atlanta, GA 30326 USA.
>>>>>>>>> Phone (404) 254-6457<https://www.google.com/voice#phones>
>>>>>>>>> RHCE Certificate: 805009616436562
>>>>>>> --------------------------------------------------------------------
>>>>>>> - To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org For
>>>>>>> additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>> .
>>>>>
>>> .
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
> .
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Mime
View raw message