couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogesh Khambia <ykham...@gmail.com>
Subject Re: view index generation for single document and bulk documents
Date Fri, 17 Jun 2011 23:37:09 GMT
Hi Dave,

Thanks a lot for the link to the blogpost.
So far, I am seeing the improvements with the daemon generating the view
index after each new document update to the database.
As earlier while using Chrome's webkit for latency measurments, I could see
PENDING as the status for the HTTP GET request.
However, with the daemon script updating the view index for each new
document, it shows the progress of time, instead of PENDING status.
I am expecting that now the time taken is because of the lookup and
retrieval of the data.
I have done some benchmarking, although more results and the correct
hardware specification I can provide by Monday.
Here what I have measured:



*Virtual Machine Server (2 cores) with other processes(AMQP process , MySQL
process , CouchDB process ) running*

*Ubuntu server(Single core)*

Number of document

360517 documents

1,428,190 documents

Time for generating first view index(no view index in cache)

3 hrs

3.3 hrs

Speed of Map function(as I am not using Reduce function)

34 documents/sec

120 documents/sec

Add 10 more documents

(view index generation time)

294 ms

83.3 ms

My design of the database is as follows:
{
"testcasename" = "MY_TEST",
"testresult" = "pass",
"product" = "HP",
"series" = "10.1.2.200",
"time" = [2011, 6, 18, 1, 14, 50]
}

I have used the CouchDB self generated id. Each write to the database
involves adding such documents to the database.
I am very much interested in knowing the following

- How the does the model of the database affect the CouchDB performance for?

   1.  A flat structure as shown above, and
   2.  A nested structure, where the objects are nested in another object
   and thereby having deep nesting.

- For better performance, which is better JavaScript MapReduce vs Erlang
MapReduce?

I will be happy to share more details, if needed.
Any suggestions, tips will be of great help to me.
Thanks.

On Thu, Jun 16, 2011 at 10:27 AM, Dave Cottlehuber <dave@muse.net.nz> wrote:

> On 16 June 2011 10:42, Yogesh Khambia <ykhambia@gmail.com> wrote:
> > Hi all,
> >
> > Currently, I am doing performance tests with database in CouchDB 1.0.1,
> > where a script is continuously writing single documents to the CouchDB
> > database.
> > I had the issue of the user being penalized for reading the view in
> CouchDB
> > database, by updating the view indexes.To improve on the latency for the
> > view index generation, I wrote a function which generates the view
> indexes
> > for each new update made to the database.
> > The latency for view index creation has been improved. However, I read
> from
> > the FAQ on the CouchDB wiki that "The reason not to integrate each doc as
> it
> > comes in is that it is horribly inefficient and CouchDB is designed to do
> > view index updates very fast, so batching is a good idea."
> > It will be really helpful if somebody can answer me on following:
> >
> > - If  view index generation for each new single document insert is not a
> > good approach?
>
> Hi Yogesh
>
> I think the wiki is pretty clear "it is horribly inefficient".
>
> The reason is that for both docs and views, separate B-tree DB files
> need to be updated. If you bulk-load docs this allows couch to do the
> b-tree balancing all at once for that bulk-load, rather than each time
> per doc. This is a lot more efficient.
> http://horicky.blogspot.com/2008/10/couchdb-implementation.html covers
> well how this works under the hood.
>
> > - How does the single document insert and bulk document insert affect the
> > CouchDB view generation, where:
> >
> >   1.  a daemon script updates view index for each new document insert.
> >   2.  a daemon script updates view index for bulk document
>
> IIRC all views in the same ddoc block while couchdb updates it. Other
> than that the same points above apply; however this will vary heavily,
> especially how your doc ids are sequenced and what your views output,
> hardware, OS, etc etc. Perhaps worth doing some benchmarking and/or
> providing more info on your use case.
>
> Somebody more familiar with the code may be able to add more if you need
> it.
>
> A+
> Dave
>



-- 
Best Regards,
Yogesh Khambia
Postgraduate Design Engineer
Mobile: +31 626 217 381
Email: y.khambia@tue.nl

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message