Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of adam.kocoloski@gmail.com
 designates 209.85.216.180 as permitted sender)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1082)
Subject: Re: Read request throughput
From: Adam Kocoloski <kocolosk@apache.org>
In-Reply-To: <A6885FDF-732C-4012-A94E-C73C406541DE@netdev.co.uk>
Date: Thu, 2 Dec 2010 09:41:55 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <F2CF193A-F58A-450F-AC81-99D5DF50E4E2@apache.org>
References: <52216727-630A-49F6-B919-E691007F1361@netdev.co.uk>
 <B2EA8E21-0B56-48A4-8ED4-106869EBE5C6@apache.org>
 <A6885FDF-732C-4012-A94E-C73C406541DE@netdev.co.uk>
To: user@couchdb.apache.org

On Dec 2, 2010, at 6:29 AM, Huw Selley wrote:

>> include_docs=3Dtrue is definitely more work at read time than =
embedding the docs in the view index.  I'm not sure  about your =
application design constraints, but given that your database and index =
seem to fit entirely in RAM at the moment you could experiment with =
emitting the doc in your map function instead ...
>>=20
>>> The total amount of data returned from the request is 1467 bytes.
>>=20
>> ... especially when the documents are this small.
>=20
> Sure, but I would have expected that to only really help if the system =
was contending for resources? I am using linked docs so not sure about =
emitting the entire doc in the view.

Didn't realize you were using linked docs.  You're certainly right, =
there's no way to emit those directly.

>> Hmm, I've heard that we did something to break compatibility with =
12B-5 recently.  We should either fix it or bump the required version.  =
Thanks for the note.
>=20
> COUCHDB-856?

Ah, right. That one was my fault.  But Filipe fixed it in r1034380, so =
it shouldn't have caused you any trouble here.

>> Do you know if the CPU load was spread across cores or concentrated =
on a single one?  One thing Kenneth did not mention in that thread is =
that you can now bind Erlang schedulers to specific cores.  By default =
the schedulers are unbound; maybe RHEL is doing a poor job of =
distributing them.  You can bind them using the default strategy for =
your CPUs by starting the VM with the "+sbt db" option.
>=20
> It was using most of 2 cores. I had a go with "+sbt db" and it didn't =
perform as well as "-S 16:2".
>=20
> WRT disabling HT - I need to take a trip to the datacentre to disable =
HT in the bios but I tried disabling some cores with:
>=20
> echo 0 > /sys/devices/system/node/nodeX/cpuX/online
>=20
> Which should stop the kernel seeing the core - not as clean as =
disabling it in the bios but should suffice. /proc/cpuinfo stopped =
showing the cores I removed so it looks like it worked.
> Again I didn't see any improvement.

Ok, interesting.  When you request an up-to-date view there are =
basically 7 Erlang processes involved: one HTTP connection handler, two =
couch_file servers (one for .couch and one for .view), a couch_db =
server, a couch_view_group server, and then two registered processes =
(couch_server and couch_view).  When you send additional concurrent =
requests for the same view CouchDB spawns off additional HTTP handlers =
to do things like JSON encoding and header processing, but these other =
six processes just need to handle the additional load themselves.

The fact that you only saw two cores regularly used suggests that one of =
these processes turned into a bottleneck (and when they weren't blocked, =
the other processes ran on the second core).  My guess would be the DB =
couch_file, since every view request was hitting it multiple times: once =
to open the ddoc and N times to load the linked documents.  But that's =
just a guess.  I'm mildly surprised that you see a significant gain from =
dropping down to 2 active schedulers, and it's not a mode of operation I =
would recommend if you plan to have multiple active databases.  But I =
can see where it might help this particular benchmark a bit.

This is the first time I've seen someone try to maximize the throughput =
for this particular type of request, so I don't have any more bright =
suggestions.  If I'm right about the cause of the bottleneck I can think =
of new optimizations we might add to reduce it in the future, but =
nothing in terms of tweaks to the server config.  Regards,

Adam

>=20
> Cheers
> Huw