couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Suggestions on optimizing document look up
Date Wed, 01 Apr 2009 21:06:07 GMT
If you're docid gets repeated, then you could very well use an LRU
cache to get what you want. To make sure you're not missing updates,
using a call to HEAD and checking the ETAG would probably be best.

Also, if you have access to some  number of document ids, you can
fetch multiple documents simultaneously by POST'ing a {"keys":
[docid1, docid2, docid3, ...]} to
http://127.0.0.1:5874/db_name/_all_docs?include_docs=true

HTH,
Paul Davis

On Wed, Apr 1, 2009 at 3:34 PM, Manjunath Somashekhar
<manjunath_somashekhar@yahoo.com> wrote:
>
> hi All,
>
> Buoyed by the response i got to my previous mail (Suggestions on View performance optimization/improvement),
i am asking another question for optimizing document look up based on _id.
>
> Let us say we have a db containing a million documents each with _id generated by us
[1.....1000000]. If we have to get all the documents one by one (assuming the search/lookup
code will get random inputs of [1..1000000]), wat would work best?
>
> As of now wat we are doing is a simple look up like:
> def getDocById(self, id):
>     return self.db[id]
>
> For doing a million lookups like this it takes about 50-60 mins on my laptop. Is there
a better way of doing the same? Thought of fetching a bunch of keys in one go caching them
(LRU style) and looking up the cache first before hitting the db, but given that the input
'id' randomly varies between [1..1000000], it has not been a great success.
>
> Any thoughts? Ideas? Suggestions?
>
> Environment details:
> Couchdb - 0.9.0a757326
> Erlang - 5.6.5
> Linux kernel - 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009 i686 GNU/Linux
> Ubuntu distribution
> Centrino Dual core, 4GB RAM laptop
>
> Thanks
> Manju
>
>
>
>

Mime
View raw message