incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Demetrius Nunes" <demetriusnu...@gmail.com>
Subject Re: Is it possible to evaluate a view on a 20.000 documents database?
Date Fri, 01 Aug 2008 12:45:51 GMT
Johan, this was excellent stuff. Thanks for the enlightenment.
As for the question on why I am using a regex insted of a plain string
comparison, is because, the "classe_id" field is structured in a way that it
starts with a string (say ABC) but can end in several other ways, so the
documents may have field values of ABCDEF, ABCGHI, ABCRST, etc. So, I am
just checking if the classe_id field starts with a certain string. Is regex
comparison a heavy operation in Javascript?

rgds,
Demetrius

On Fri, Aug 1, 2008 at 4:37 AM, Johan Liseborn <johan.liseborn@gmail.com>wrote:

> On Fri, Aug 1, 2008 at 00:38, Demetrius Nunes <demetriusnunes@gmail.com>
> wrote:
> > The view I am trying to create is really simple:
> >
> > function(doc) {
> >  if
> >
> (doc.classe_id.match(/8a8090a20075ffba010075ffbed600028a8090a20075ffba010075ffbf7200c48a8090a20075ffba010075ffbf7200d9/))
> >    emit(doc.id, doc);
> > }
> >
> > It's being applied to a 20.000 documents dataset and I've already waited
> > several minutes until the CPU cooled off, but to my surprise, the view is
> > still taking a long time to respond when I try to run it. Ive never
> actually
> > got a result out of it...
> >
> > Am I doing something wrong?
>
> I guess you have already gotten a number of answers, but just to give
> you some additional input (which points in the same direction), here
> is some data from a little experiment I just did:
>
> I have a database consisting of documents that describe "projects";
> each document has a number of fields including fields for project
> manager, due date, an array of project activities (which in turn has
> descriptions, an array of assigned workers, etc), an array of notes,
> and a field giving the priority (the point being the documents are
> "semi-complex", or at least I *think* they could be considered so; I
> am not sure how much this matter, but it seems to matter a little, at
> least when the document itself is part of the output of the view
> (which it *isn't* in my example below, but anyway...)).
>
> I am running this on a second generation MacBook (core 2 duo) with
> Erlang R12B-3, SMP enabled.
>
> Now, I have a view which gives me the number of projects per priority
> level. The view consists of the following map and reduce functions
> (mind you, I am not sure that I am doing this entirely correctly, I am
> pretty new to using CouchDB (my third day of playing with actually),
> and I am still figuring the map/reduce stuff out; the result of the
> view seems to be correct though):
>
> map: function(doc) { if (doc.type == 'task') emit(doc.priority, 1); }
>
> reduce: function(keys, values) { return sum(values); }
>
> I just ran a test where I had a database already consisting of 42.000
> project documents (the view had already been indexed on these
> documents). I added an additional 10.000 documents, and then ran the
> view above like so:
>
> Johans-MacBook% time curl
> 'localhost:5984/test-001/_view/tasks/per_prio_count?group=true'
>
> The result I got back was:
>
>
> {"rows":[{"key":1,"value":10391},{"key":2,"value":10399},{"key":3,"value":10482},{"key":4,"value":10320},{"key":5,"value":10408}]}
> curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true'
> 0.01s user 0.03s system 0% cpu 13:32.02 total
>
> Running the view again gave the following result:
>
>
> {"rows":[{"key":1,"value":10391},{"key":2,"value":10399},{"key":3,"value":10482},{"key":4,"value":10320},{"key":5,"value":10408}]}
> curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true'
> 0.00s user 0.00s system 0% cpu 0.703 total
>
> As the last part, I added an additional 10 documents and then re-ran
> the view, giving the following result:
>
>
> {"rows":[{"key":1,"value":10392},{"key":2,"value":10400},{"key":3,"value":10487},{"key":4,"value":10322},{"key":5,"value":10409}]}
> curl 'localhost:5984/test-001/_view/tasks/per_prio_count2?group=true'
> 0.00s user 0.00s system 0% cpu 1.207 total
>
> AFAIU, when you add new documents and then evaluate a view including
> those documents, indexing will happen, but only for the newly added
> documents (i.e. already indexed documents will not be re-indexed). I
> believe this means that the time to index will be, in some way,
> proportional to the number of *new* documents. I believe I have seen a
> big-O "number" for this somewhere, but I don't remember right now if
> it is O(n), O(log n), or something else (I am sure someone else on the
> list can answer that :-).
>
> As can be seen from the results, when CouchDB had to index the 10.000
> new documents, it took about 13 minutes to get the result, but when
> all the documents had been indexed, the answer came back in 0.7
> seconds. Having to index 10 documents did not take that long, giving
> an answer in 1.2 seconds.
>
> Hope this help in some way.
>
>
> Cheers,
>
> johan
>
>
> P.S.
>
> I am really excited about CouchDB; kudos to Damien and everyone else
> involved (sorry, I don't know all of your names yet :-)
>
> --
> Johan Liseborn
>



-- 
____________________________
http://www.demetriusnunes.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message