couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gonzalez <gonva...@gonvaled.com>
Subject Creating a database with lots of documents and updating a view
Date Tue, 13 Mar 2012 22:45:39 GMT
Hi,

I am creating a database with lots of documents (3 million).
I have a view in the database:

function(doc) {
    if (doc.PORTED_NUMBER) emit(doc.PORTED_NUMBER, doc.RECEIVING_OPERATOR);
}

To speed up view creation, I am doing the following (Strategy A)

   1. Define view
   2. Insert 1000 documents
   3. Access the view
   4. Goto 2

And I repeat this process until all documents have been inserted.

I have read that this is faster than my previous strategy (Strategy B,
obsolete):

   1. Insert all documents
   2. Define view
   3. Access view

My problem is that, in my current Strategy A, step 3 is taking longer and
longer. Currently I have around 300 thousand documents inserted and view
access is taking around 120s.
The evolution of the delay in view access has been:

2012-03-13 23:01:40,405 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:03:29,589 - __main__             - INFO       -       - View
ready, ellapsed 109
2012-03-13 23:03:32,945 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:05:31,699 - __main__             - INFO       -       - View
ready, ellapsed 118
2012-03-13 23:05:35,106 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:07:28,392 - __main__             - INFO       -       - View
ready, ellapsed 113
2012-03-13 23:07:31,663 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:09:26,929 - __main__             - INFO       -       - View
ready, ellapsed 115
2012-03-13 23:09:30,572 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:11:27,490 - __main__             - INFO       -       - View
ready, ellapsed 116
2012-03-13 23:11:30,784 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:13:21,575 - __main__             - INFO       -       - View
ready, ellapsed 110
2012-03-13 23:13:24,937 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:15:23,519 - __main__             - INFO       -       - View
ready, ellapsed 118
2012-03-13 23:15:26,836 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0
2012-03-13 23:17:23,036 - __main__             - INFO       -       - View
ready, ellapsed 116
2012-03-13 23:17:26,310 - __main__             - INFO       -       -
BulkSend >> requested=   1000 ok=   1000 errors=      0

It started with around 1s, and it is increasing more or less monotonically.
It is already running since 7 hours ago, and only 300000 documents have
been imported and indexed.
If everything continues like this (I do not know what kind of matematical
function this is following, but for me it seems like an exponential
function), importing the 3 million of documents is going to take forever.

Is there a way to speed this up?

Thanks!
Daniel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message