Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Sender: J Chris Anderson <jchris@couch.io>
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1081)
Subject: Re: Some stats about couch DB
From: J Chris Anderson <jchris@apache.org>
In-Reply-To: <E0E0E896-D0E5-46AD-9269-D84710AEDFBA@mymedify.com>
Date: Fri, 23 Jul 2010 18:42:05 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <673E2CFF-D6F7-4077-BB43-450B09EC089A@apache.org>
References: <E0E0E896-D0E5-46AD-9269-D84710AEDFBA@mymedify.com>
To: user@couchdb.apache.org


On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote:

> Hi All,
>=20
> As I am playing more and more with couchdb (it is relaxing and fun), i =
just am trying to understand the limits and the expectations in large =
production system environment.
>=20
> Right now i have about 100K documents and i have about 10 different =
views, one of the view generates does about 100 emits per document.
>=20
> As i am building the view indexes, it is taking about 7-8 hours of =
time.
>=20

this is about right for 10 million rows. That works out to about 350 =
rows per second (maybe more depending on what your other view are =
doing), which is a bit slower than I'm used to seeing, but it depends on =
the size of your emitted keys and values. If you can shrink the keys or =
the values you should see some speedup (marginal, not an order of =
magnitude).

because view generation is incremental, in production the 7-8 hours =
isn't the big issue, it's whether view generation can keep up with the =
insert rate. So if you are generating less than a few documents per =
second (x 100 emitted rows) then you should be able to keep the indexes =
current. If the indexes start to fall behind I'd suggest either =
upgrading hardware or moving to a clustered solution like =
CouchDB-Lounge.

for purposes of prototyping you will probably be happier working on a =
subset of the documents.


> I would like to know is that how are other people using it?
> Is 7-8 or even 24 hours of checkpointing view generation typical?
> How many documents do people have??
> How is other people's experience in genereting a view on lets say 1 =
MIllion documents.
>=20
> I have switched to the native _sum function for reduce. What else is =
taking long? Is it the map function written in JavaScript? Is it the =
Index that's getting too big?
>=20


using an Erlang view function could potentially speed things up (but my =
guess is you are more likely disk-io bound, not CPU bound, so maybe it =
won't make much difference.)


> Is the view generation linear or does it gets worse when you have more =
documents?
>=20


the btree should get slower at roughly O(log n) where n is the number of =
rows. The base of the log is pretty big, too. Once you get up to the =
billion-rows territory you'll probably want to look more closely at =
CouchDB Lounge or the Cloudant clustering.

> I would extremely appreciate help in answering or discussing these =
questions.
>=20
> Thanks in advance,
> Talib
>=20