incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guby <guby.m...@gmail.com>
Subject Re: Updating views on save
Date Mon, 28 Apr 2008 13:31:37 GMT
Thank you for all the answers!

Benoit:
good idea, and I believe that is what Jan used here as well? I didn't  
know about this before! Great sollution!


Kristopher:
 >In the current situation, if I write 10 times and read 100 times,  
the index may only be processed once if the 100 reads come after the  
10 writes -- in a more real-world situation though, we're only >going  
to be updating the index as many times as we write -- but keep in mind  
that given the right circumstances, the index only has to be  
regenerated once.

Thanks for your comments!
In my case it is actually more like 100 writes for each 10 reads... So  
then having to wait for a substantial view update takes quite a bit of  
time.

 >Also, please realize that the index does not have to be /completely/  
regenerated upon reindex -- only the documents that have been added/ 
modified.

That is why I thought regenerating the view for changed documents on  
save wouldn't be that much of a performance hit... but it turns out I  
was wrong. I didn't think of low level stuff like the byte layout  
mentioned by Jan. In my case updating on save is still going to be the  
best solution I believe, because I can't afford waiting long for view  
updating when I am requesting the views from the front end.

 >Lastly, I would recommend that you try to optimize your view code to  
make things faster, as well.

This is also a good idea! Although my view code is really straight  
forward. I have read all the wikis as well, but if you guys know of  
any "best view practice" resource, or have tips for view optimizations  
handy please let me know!


Cortland:
 >Not sure why your views are timing out, but from my current  
understanding views are incrementally updated with modifications but  
only incrementally updated on a call to that view.

My bad for not being clear. The views themselves do not time out. The  
web page generation for the end user times out because regenerating  
the view takes so much time.

 >Are you using the javascript spodermonkey viewserver(default) or  
another one? Check the complexity of the view and possibly minimize  
the view's complexity.

I am using the standard built in spidermonkey view server. I believe  
the views to be pretty clean, but then again I have a lot to learn  
about best practices for document based databases!

 >I'm not sure, but I think the pattern here is you put views most  
likely to be called near to each other in the design document, say  
blog summaries view followed by full content view, and have less
 >related views in a different design document, say for a list of  
authors or a tag list.

Smart approach. I had currently grouped my views by the datatypes they  
contain... Different views for feeds in one design documents and views  
for feed entries in another and one design document for users and so  
on... I'll see how much I can change it around for the better!


Jan:
Thanks for all the comments!
I didn't know about the DbUpdateNotificationProcess functionality  
before you and Benoit mentioned it! Just what I need!

> #!/bin/sh
>
> counter=0
> max_docs=100
>
> while true
> 	do
> 	read database
>
> 	counter=`expr $counter + 1`
>
> 	if [ $counter -ge $max_docs ]; then
> 		`curl http://server:5984/$database/_view/name?count=0`
> 		counter=0
> 	fi
> done

I am wondering, why is there a loop here? Isn't the shell script  
called once every time the database receives an update? That is what I  
can read out of the documentation for the DbUpdateNotificationProcess  
in the wiki...
Couldn't it just as well be written like this?

> #!/bin/sh
>
> read database
>
> `curl http://server:5984/$database/_view/one_for_each/view?count=0`
> `curl http://server:5984/$database/_view/other_view/jadijadd?count=0`
>

What is it I am not understanding?

Thanks for all the answers and your time!


Best regards
Sebastian










On Apr 28, 2008, at 5:22 AM, Jan Lehnardt wrote:

> Heya Sebastian,
> it seems you feel rather strongly about this issue. But that's
> nothing a little engineering can solve for you, read on :)
>
> On Apr 28, 2008, at 01:04, Guby wrote:
>> Hello dear Couchers
>>
>> I understand that the views are indexed the first time they are  
>> accessed and as far as I know there is no way to turn on view  
>> updating on document save. I really don't understand the reasoning  
>> behind this behavior. The advantage of the pre-populated/indexed  
>> views are that they are blazingly fast to query and access, but  
>> that advantage disappears when the first request after a document  
>> update has to regenerate the view first!
>> I am currently building a web app where the background processes  
>> perform a lot of writes to the database. The time it takes to write  
>> a document is not critical for me. What is critical though is the  
>> time it takes to load web pages for the end user that require  
>> content from the database.
>> In some situations the background processes add thousands of  
>> documents to the database within a short period of time, and when  
>> the user tries to access a page after such an update the view  
>> querying sometimes takes minutes and as a consequence of that the  
>> browser times out... Not a recipe for happy customers...
>>
>> The only solution I can see at the moment is to create a worker  
>> that queries the database whenever it is told that there has been a  
>> document update, but that seems really stupid and unnecessary. And  
>> in my case, running on a smallish VPS, a big waste of resources  
>> having an extra working doing something the database itself could  
>> just as well have done. It also requires a lot of extra coding  
>> notifying the worker whenever I update or create a document  
>> throughout my app.
>
> That would be a rather extreme solution. Why not, for
> example, trigger a view update from your document-
> insertion code, every N (N = 10, 30, 60?) seconds?
>
>
>> I am sure you have reasons for having implemented the views the way  
>> you have, but I would be really interested to hear why it has been  
>> done this way!
>
> 1) To not have a 'write penalty' for all views when
> documents are added. We expect you to have
> quite a few of views and updating all of them on-write
> seems silly. The data is generated when needed,
> saving resources by 2) not clogging them up when
> needed elsewhere and 3) processing large quantities
> of data in batches. and finally 4) The very layout of the
> bytes that make up documents on disk and the way they
> are read are optimised for super-fast index creation. This
> is expected to be a common operation. I still understand
> that this leaves things to be desired for you.
>
>
>> My wishes are for an optional updating of views on save feature! In  
>> some cases that might regenerate a view several times without it  
>> actually being accessed in between, but that is a tradeoff I can  
>> live with, slow views on the other hand is something I can not!
>
> Put this in a shell script called view_trigger.sh
>
> #!/bin/sh
>
> counter=0
> max_docs=100
>
> while true
> 	do
> 	read database
>
> 	counter=`expr $counter + 1`
>
> 	if [ $counter -ge $max_docs ]; then
> 		`curl http://server:5984/$database/_view/name?count=0`
> 		counter=0
> 	fi
> done
>
> and add view_trigger.sh to our couch.ini as a  
> DbUpdateNotificationProcess
>
> voilá :)
>
> Yes, this is extra work externally, but this is still a sensible
> solution. From our perspective, we do not need to change
> the core server behaviour to get you what you need and
> you still benefit from the batching of index creation.
>
> Also, I'd like to second what Cortland said: All views in a
> design document get updated if you query one of them.
> Be aware of that :)
>
> And on a final note: Thanks for writing in. Don't be
> discouraged by the replies. If there are other things that
> you would love to see in CouchDB, please let us know.
>
> Also, if enough users request a feature, we will consider
> putting it in, even on-


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message