incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Creating a database with lots of documents and updating a view
Date Tue, 13 Mar 2012 23:33:52 GMT
You can see progress in /_active_tasks or in Futon, it should show the
update sequence number. The four year old page you link to might have
been accurate in 2008 but I don't think it's true now. You should
expect view building to slow down in an O(log n) curve as befits a
b+tree. The numbers you see with very few documents are
unrealistically high as everything fits in the disk cache. If you were
to graph it you would see a very high peak that quickly softens into a
curve.

B.

On 13 March 2012 23:08, Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
> Hi,
>
> I have no reduce on the view, and that is my only view.
> I *am* doing bulk inserts (1000 documents), and after each bulk insert, I
> access the view. (my assumption is that this will be faster than accessing
> the view once at the end of inserting the 3 million documents)
>
> I know that I will get here very varying numbers, but: what is the expected
> view indexing time for the view that I posted and for an amount of 3
> million documents?
>
> How can I monitor view creation? (how many documents have been already
> indexed)
>
> I got the idea that "bulk insert + view access + repeat" was faster that
> "full insert + view access" here:
> http://iamseanmurphy.com/2008/09/08/couchdb-view-generation/
>
> Thanks,
> Daniel
>
> On Tue, Mar 13, 2012 at 11:58 PM, Robert Newson <rnewson@apache.org> wrote:
>
>> The view build is already batched. In my opinion your strategy A can
>> only ever be slower or the same speed as B.
>>
>> Try inserting the docs using _bulk_docs, it'll go much faster. I'd
>> fill the database up and hit the view at the end for the fastest build
>> time, but I'd still expect it take a while to build the view the first
>> time.
>>
>> Do you have a reduce on the view? Are there other views in the same
>> design document?
>>
>> B.
>>
>> On 13 March 2012 22:45, Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
>> > Hi,
>> >
>> > I am creating a database with lots of documents (3 million).
>> > I have a view in the database:
>> >
>> > function(doc) {
>> >    if (doc.PORTED_NUMBER) emit(doc.PORTED_NUMBER,
>> doc.RECEIVING_OPERATOR);
>> > }
>> >
>> > To speed up view creation, I am doing the following (Strategy A)
>> >
>> >   1. Define view
>> >   2. Insert 1000 documents
>> >   3. Access the view
>> >   4. Goto 2
>> >
>> > And I repeat this process until all documents have been inserted.
>> >
>> > I have read that this is faster than my previous strategy (Strategy B,
>> > obsolete):
>> >
>> >   1. Insert all documents
>> >   2. Define view
>> >   3. Access view
>> >
>> > My problem is that, in my current Strategy A, step 3 is taking longer and
>> > longer. Currently I have around 300 thousand documents inserted and view
>> > access is taking around 120s.
>> > The evolution of the delay in view access has been:
>> >
>> > 2012-03-13 23:01:40,405 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:03:29,589 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 109
>> > 2012-03-13 23:03:32,945 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:05:31,699 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 118
>> > 2012-03-13 23:05:35,106 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:07:28,392 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 113
>> > 2012-03-13 23:07:31,663 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:09:26,929 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 115
>> > 2012-03-13 23:09:30,572 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:11:27,490 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 116
>> > 2012-03-13 23:11:30,784 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:13:21,575 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 110
>> > 2012-03-13 23:13:24,937 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:15:23,519 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 118
>> > 2012-03-13 23:15:26,836 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> > 2012-03-13 23:17:23,036 - __main__             - INFO       -    
  -
>> View
>> > ready, ellapsed 116
>> > 2012-03-13 23:17:26,310 - __main__             - INFO       -    
  -
>> > BulkSend >> requested=   1000 ok=   1000 errors=      0
>> >
>> > It started with around 1s, and it is increasing more or less
>> monotonically.
>> > It is already running since 7 hours ago, and only 300000 documents have
>> > been imported and indexed.
>> > If everything continues like this (I do not know what kind of matematical
>> > function this is following, but for me it seems like an exponential
>> > function), importing the 3 million of documents is going to take forever.
>> >
>> > Is there a way to speed this up?
>> >
>> > Thanks!
>> > Daniel
>>

Mime
View raw message