couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Big problems with writes to _config
Date Wed, 28 Oct 2009 02:46:28 GMT
On Oct 27, 2009, at 7:49 PM, eric casteleijn wrote:

> We have the following setup:
>
> 2 near identical public facing django servers communicating with one  
> couchdb server. The couchdb server is oauth authenticated and people  
> can access it directly (well, through an apache proxy) if they have  
> the tokens to do so. New users are signed up through these django  
> servers, after which they add the user and their tokens to couchdb.  
> (the user through a POST to _users and the tokens through PUTs to  
> _config)
>
> We see this failing a lot, now to the point where we think it fails  
> all the time (since all those systems have separate logs not all of  
> which we have access to, this is not trivial to piece together.)
>
> The errors the API servers get back all look like these (the lines  
> starting with '(500':
>
> '2009-10-27 22:35:15,357 ERROR    UbuntuOne.couch: failed to add  
> ***** = 40693 to section [oauth_token_users] of local.ini:
>
> (500, (u'timeout', u'{gen_server,call,\n            [couch_config, 
> \n          {set,"oauth_token_users","*****","40693",true}]}'))'
>
> '2009-10-27 22:35:20,399 ERROR    UbuntuOne.couch: failed to add  
> ***** = ***** to section [oauth_token_secrets] of local.ini:
>
> (500, (u'timeout', u'{gen_server,call,\n            [couch_config, 
> \n          {set,"oauth_token_secrets","*****",\n "*****", 
> \n                  true}]}'))'
>
> Corresponding errors in the couchdb.log look like:
>
>
> My theory was that these writes to _config fail because the  
> local.ini is somehow corrupted, but I can't access that file  
> directly (since it has users' secrets) or copy it to my machine to  
> test this theory, and helping someone who is allowed to see it look  
> for anything weird is like searching for the proverbial needle in  
> the haystack: we have lots of users, and users can have multiple  
> tokens. Add to that the fact that you cannot ever delete a line from  
> the .ini file (DELETEs against keys in _config just empty the value  
> and leave a line like 'foo = \n'!
>
> After speaking to Jan on the channel he proposed that it may be that  
> the gen_server message inbox overflows and the gen_server times out.
>
> Could that be, under high load, and how can we solve this? Can we  
> increase the size of this inbox, or can we possibly have multiple  
> processes handling the access? Whether it's high load or corruption  
> or something else again, right now it looks like NO new tokens can  
> be added, and hence no new users can use our system. In short: HALP!

Hi Eric, I think we all know the long term solution is to store oauth  
information in a DB instead of the config file.  Barring that, in the  
short term some steps that can be taken to avoid these errors include

1) extending or disabling the couch_config gen_server timeout.  The  
default is 5000 milliseconds.  This is a one-line patch.

2) Writing to the .ini file asynchronously.  The in-memory  
configuration state can sustain update rates that are orders (plural)  
of magnitude larger than the update rate for the .ini file itself.   
With a bit of work you could cook it so that you still didn't respond  
to the PUT /_config/... request until the update was actually written  
to the file, while at the same time freeing the config server to  
handle more requests.

In each case the response times for PUT/_config/... may become  
uncomfortably long, but at least you won't be serving 500s from couch.  
Best, Adam

Mime
View raw message