couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Liseborn" <johan.liseb...@gmail.com>
Subject Re: Is it possible to evaluate a view on a 20.000 documents database?
Date Fri, 01 Aug 2008 07:37:53 GMT
On Fri, Aug 1, 2008 at 00:38, Demetrius Nunes <demetriusnunes@gmail.com> wrote:
> The view I am trying to create is really simple:
>
> function(doc) {
>  if
> (doc.classe_id.match(/8a8090a20075ffba010075ffbed600028a8090a20075ffba010075ffbf7200c48a8090a20075ffba010075ffbf7200d9/))
>    emit(doc.id, doc);
> }
>
> It's being applied to a 20.000 documents dataset and I've already waited
> several minutes until the CPU cooled off, but to my surprise, the view is
> still taking a long time to respond when I try to run it. Ive never actually
> got a result out of it...
>
> Am I doing something wrong?

I guess you have already gotten a number of answers, but just to give
you some additional input (which points in the same direction), here
is some data from a little experiment I just did:

I have a database consisting of documents that describe "projects";
each document has a number of fields including fields for project
manager, due date, an array of project activities (which in turn has
descriptions, an array of assigned workers, etc), an array of notes,
and a field giving the priority (the point being the documents are
"semi-complex", or at least I *think* they could be considered so; I
am not sure how much this matter, but it seems to matter a little, at
least when the document itself is part of the output of the view
(which it *isn't* in my example below, but anyway...)).

I am running this on a second generation MacBook (core 2 duo) with
Erlang R12B-3, SMP enabled.

Now, I have a view which gives me the number of projects per priority
level. The view consists of the following map and reduce functions
(mind you, I am not sure that I am doing this entirely correctly, I am
pretty new to using CouchDB (my third day of playing with actually),
and I am still figuring the map/reduce stuff out; the result of the
view seems to be correct though):

map: function(doc) { if (doc.type == 'task') emit(doc.priority, 1); }

reduce: function(keys, values) { return sum(values); }

I just ran a test where I had a database already consisting of 42.000
project documents (the view had already been indexed on these
documents). I added an additional 10.000 documents, and then ran the
view above like so:

Johans-MacBook% time curl
'localhost:5984/test-001/_view/tasks/per_prio_count?group=true'

The result I got back was:

{"rows":[{"key":1,"value":10391},{"key":2,"value":10399},{"key":3,"value":10482},{"key":4,"value":10320},{"key":5,"value":10408}]}
curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true'
0.01s user 0.03s system 0% cpu 13:32.02 total

Running the view again gave the following result:

{"rows":[{"key":1,"value":10391},{"key":2,"value":10399},{"key":3,"value":10482},{"key":4,"value":10320},{"key":5,"value":10408}]}
curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true'
0.00s user 0.00s system 0% cpu 0.703 total

As the last part, I added an additional 10 documents and then re-ran
the view, giving the following result:

{"rows":[{"key":1,"value":10392},{"key":2,"value":10400},{"key":3,"value":10487},{"key":4,"value":10322},{"key":5,"value":10409}]}
curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true'
0.00s user 0.00s system 0% cpu 1.207 total

AFAIU, when you add new documents and then evaluate a view including
those documents, indexing will happen, but only for the newly added
documents (i.e. already indexed documents will not be re-indexed). I
believe this means that the time to index will be, in some way,
proportional to the number of *new* documents. I believe I have seen a
big-O "number" for this somewhere, but I don't remember right now if
it is O(n), O(log n), or something else (I am sure someone else on the
list can answer that :-).

As can be seen from the results, when CouchDB had to index the 10.000
new documents, it took about 13 minutes to get the result, but when
all the documents had been indexed, the answer came back in 0.7
seconds. Having to index 10 documents did not take that long, giving
an answer in 1.2 seconds.

Hope this help in some way.


Cheers,

johan


P.S.

I am really excited about CouchDB; kudos to Damien and everyone else
involved (sorry, I don't know all of your names yet :-)

-- 
Johan Liseborn

Mime
View raw message