couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Metson <>
Subject Re: Slow filtered _changes feed
Date Wed, 20 Oct 2010 15:16:02 GMT
	That will improve things, but it's still potentially got to skip  
through a lot of records to return limit number of records. Say you  
have a very prolific user and one who is less active. The prolific guy  
has 1000 messages for every one the laid back guy gets. If you're  
filtering by user you're going to have to go through ~10000 records to  
return the less active user 10 documents (assuming I'm understanding  
_changes right...). The problem is not getting the 10 documents out  
but skipping the 10000 documents that don't match the filter.

If the records are split across databases (one per user) you'll only  
hit the relevant changes to your user. Of course that might not be  
possible for your use case....

On 20 Oct 2010, at 14:31, wrote:

> Hello,
> Perhaps you can use the limit option together with the since option  
> to retrieve the changes feed. That way, no matter if the application  
> is doing a first time initialization, or starts from the last known  
> sync, your code will always be the same :
> 1/ retrieve N changes infos starting from sequence S (S = 0 the very  
> first time)
> 2/ do I have N changes in the couchdb response ? if so repeat 1,  
> setting S to the last seq number of the couchdb response.
> Regards,
> Mickael
> ----- Mail Original -----
> De: "Cory Zue" <>
> À: "user" <>
> Envoyé: Mercredi 20 Octobre 2010 14h06:59 GMT +01:00 Amsterdam /  
> Berlin / Berne / Rome / Stockholm / Vienne
> Objet: Re: Slow filtered _changes feed
> Thanks Simon,
> On Wed, Oct 20, 2010 at 6:43 AM, Simon Metson
> <> wrote:
>> Hi,
>>        One thought: do you query the last N changes or the whole  
>> feed? If I
>> apply a simple filter to my test database it is slow to get the  
>> full result,
>> but it's relatviely fast to skip to the more recent changes (the  
>> ones I've
>> already consumed and go from there.
> When the phones sync they provide a token containing information about
> the last known sync,which we use to skip ahead in the changes feed.
> However, on a first time initialization of the phone it processes the
> entire feed, and if this doesn't succeed then subsequent ones likely
> won't either.
>> Also, the _changes feed streams the documents down to the client -  
>> can your
>> client/server deal with a streaming response?
> It doesn't yet, although we could theoretically add this.
> My latest plan is to basically save a copy of all the relevant
> information in a couch doc after every attempted sync.  In this case
> the first operation would still timeout, but if the client retried all
> the relevant doc ids could be retrieved from that document and only a
> small update would have to be applied.  Since this is only likely a
> problem when it is a long time between syncs I think it could work ok
> (definitely not ideal, though).  Does this seem sane?
>> Cheers
>> Simon
>> On 20 Oct 2010, at 03:44, Cory Zue wrote:
>>> Howdy,
>>> I'm bringing up a problem I chatted about with a few folks with on  
>>> IRC
>>> today but was unable to solve.  My app is using the _changes feed to
>>> detect what updates need to go to particular clients (in this case
>>> cell phones) based on some filtered information the phones send up  
>>> in
>>> the sync request.  The flow looks something like:
>>> Phone ---HTTP POST---> Server
>>> Server ---filtered _changes---> CouchDB
>>> [Server prepares couch results for phone]
>>> Server ---Data Payload---> Phone
>>> All of the above represents a single HTTP POST and response between
>>> the phone and server.
>>> The problem I am seeing is that hitting the _changes feed from the
>>> server is prohibitively slow, and these requests are timing out  
>>> before
>>> the server can send data back down to the phone.
>>> I was led to believe on IRC that changing my filter from  
>>> javascript to
>>> erlang would drastically increase performance, but I'm not observing
>>> this at all.  In fact the erlang version seems slightly slower.
>>> I setup an erlang view server following these instructions:
>>> Am I missing something?  Is my erlang so bad as to negate the
>>> increased performance gain from switching over?  Was I lied to?   
>>> Is my
>>> whole approach dumb and do I need to implement filtered caching  
>>> inside
>>> my server and outside of couch?
>>> Any thoughts or feedback would be much appreciated.
>>> thanks,
>>> Cory

View raw message