incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Thostrup Jensen <thost...@gmail.com>
Subject A short _id size performance report and question regarding 0.11 performance
Date Thu, 04 Mar 2010 13:41:56 GMT
Hi

We're running CouchDB in production, and are currently storing around
~800K records in it. Lately view performance has started to become a
hindering factor, especially when creating new views or changing
existing ones (which is essentially creating a new view).

However we are currently using 56 byte _id fields, which I've come to
realize was a bad choice. So I've made a few tests with smaller _id
fields and decided to post them here. Unfortunately we cannot use the
UUIDs assigned by CouchDB as we rely on the _id field to detect
duplicate records (which is somewhat inherent in the way we collect
distributed information, though it doesn't happen particularly often,
it is definitely needed). Our data is also somewhat hetereogenous, and
we often generate view keys based on different data items in the
records, including the actual data values (so relational is a somewhat
poor fit for us).

I've done tests with 56, 22, and 12 bytes _id fields. The initial
tests where done with CouchDB 0.10.0 on Karmic. I've tried 0.11 as
well (but we'll take that later in the mail). 4 byte _id fields are
not really possible for us as we would have significant chance of
getting different records with the same _id. 8 bytes should be
possible though, but wasn't tested.

Test 1:

Insert 70k records into database (inserted in same order), in chunks
of 100 and measure db size:

Results:

56 bytes  207.0 MB
22 bytes  175.6 MB
12 bytes  165.8 MB

After compaction

56 bytes  146.7 MB
22 bytes  125.8 MB
12 bytes  120.0 MB

Test 2:

Construct a simple view over the data:

56 bytes  73 MB
22 bytes  54 MB
12 bytes  47 MB

After compaction:

56 bytes  19 MB
22 bytes  14 MB
12 bytes  12 MB


Test 3:

Time for constructing a temporary view:

56 bytes  70 seconds
22 bytes  57 seconds
12 bytes  53 seconds

In short, smaller _id fields provide a nice space reduction and saves
a bit of time, but doesn't make it significantly faster.

I build the current branch of 0.11 on Karmic as collation performance
should have improved with that. I only redid the 12 byte _id tests.

Test1:
After initial insert: 151.3 MB (a bit smaller than 0.10)
After compaction: 120.0 (same as 0.10)

Test2 :
Initial view build size: 153 MB (quite a lot more than 0.10)
After compactions: 12 MB (same as 0.10)

Test3:
Time for constructing temporary view: 121 seconds (more than twice of 0.10).

Does anyone have an idea of what could be wrong?
Especially the increased view build time worries me, as I was hoping
0.11 could provide a needed performance boost for us.


Please CC any replies, as I am not subscribed.

-- 
   - Henrik

Mime
View raw message