incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gonzalez <gonva...@gonvaled.com>
Subject Re: Creating a database with lots of documents and updating a view
Date Fri, 16 Mar 2012 09:23:08 GMT
> Number 1 and 2 are true and *should* show logarithmically-increasing
> costs per document (i.e. not too bad).
>
> 500k updates per day is 6 updates per second which CouchDB can easily
> maintain and index.
>
> In that light, would you agree that you are currently enduring the
> long sunk cost of adding and indexing all those documents? (By
> comparison, mkfs.ext3 is painfully slow for large volumes, but we
> don't mind because we know it's a one-off cost.)
>
> Having said all of that, this is one of the problems BigCouch solves.
> It is pretty much API-compatible with CouchDB so you might investigate
> that option too. In my opinion, roughly speaking, you get better
> performance at a cost of a little more operational (sysadmin) work.
>
> --
> Iris Couch

Most of my problems have been reduced by using a smaller (and ordered)
doc_id. Database creation time has decreased, as well as size.
I am not sure how I will handle updates. Using an increasing doc_id
when the database is new is quite easy: I just have one increasing
integer being coded using a base64 ordered dictionary:

"-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"

Doing the same on updates (modifications, deletions, insertions), will
be more challenging.

Now regarding view creation time: I understand that there is a
one-time overhead indexing the newly created database. I have just
triggered my view in this 22 million database, and will be waiting for
it to finish for some time. I have triggered it like this:

curl -X GET localhost:5984/mydb/_design/tools/_view/by-number | grep -c .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left
 Speed
  0     0    0     0    0     0      0      0 --:--:--  0:21:46 --:--:--
    0

And curl is happily waiting for the view to return some results. It is
nice that I can see how much time indexing is taking. What I would
*very* much like to know is: how many documents have already been
indexed? How can I know this? Is there any request that I can send to
CouchDB to get this information? Log files?

Thanks,
Daniel Gonzalez

Mime
View raw message