couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Leddy <m...@loop.com.br>
Subject Re: Couchdb trunk purge_docs timeout
Date Mon, 02 May 2011 17:17:11 GMT
Sorry its been two weeks, but I found out what was happening. I was
reprocessing some data which was being passed through a script to
resolve conflicts. However the number of conflicts being reprocessed
was much higher than had earlier been encountered.

Effectively the _bulk_doc insert call to resolve conflicts was taking
more than 5 seconds to complete so that other callers to
couch_db_updater on the same database were failing. As what I was 
doing was pretty extreme it was interfering with purging and 
replication or calls to compact etc.

I could see that the couch_db_updater process for the database was
spending most of its time in {couch_key_tree,merge_one} and
consequently not serving requests (I have to learn how to profile
erlang :-( )

I wrote a simple(ish) benchmarking script which can be used to 
generate what I was seeing:

###################################################################

require 'restclient'
require 'json'

Database='http://127.0.0.1:5984/conflict_test'

def flush_all
  # Flush all to disk
  RestClient::Request.execute(:method => :post,
    :url => Database+'/_ensure_full_commit',
    :headers => {:content_type => 'application/json'})
end


def run(documents)

  setup = Time.new
  # Prepare database
  RestClient::Request.execute(:method => :delete, :url => Database) rescue RestClient::ResourceNotFound
  RestClient::Request.execute(:method => :put, :url => Database)

  # Fill with conflicts
  docs=documents.times.collect {|i| {:_id => 'samedoc', :some_data => i}}
  RestClient::Request.execute(:method => :post,
    :url => Database+'/_bulk_docs',
    :payload => {'all_or_nothing' => true, 'docs' => docs}.to_json,
    :headers => {:content_type => 'application/json'})

  # Resolve the conflicts
  meta=JSON.parse(RestClient::Request.execute(:method => :get,
    :url => Database+'/samedoc?meta=true').body)
  revisions=[meta['_rev']]+meta['_conflicts']
  winner=revisions.delete(rand(revisions.size))

  flush_all

  # Do it
  bulk=revisions.collect {|r| {'_id'=>'samedoc', '_rev'=>r, '_deleted'=>true}}
  resolve = Time.new
  RestClient::Request.execute(:method => :post,
    :url => Database+'/_bulk_docs',
    :payload => {'all_or_nothing' => true, 'docs' => bulk}.to_json,
    :headers => {:content_type => 'application/json'},
    :timeout => 3600)

  to_disk = Time.new
  flush_all

  now=Time.new
  puts "#{documents} #{resolve-setup} #{to_disk-resolve} #{now-to_disk}"

end

10.step(500, 10) {|n| run n}

###################################################################

If you try to 

curl -s -X POST -H 'Content-Type: application/json' 127.0.0.1:5984/conflict_test/_compact

during a long call to resolve the conflicts you'll get an internal server
error:

[Mon, 02 May 2011 16:56:01 GMT] [info] [<0.10147.0>] 127.0.0.1 - - POST /conflict_test/_bulk_docs
201
[Mon, 02 May 2011 16:56:06 GMT] [error] [<0.10193.0>] Uncaught error in HTTP request:
{exit,
                                                       {timeout,
                                                        {gen_server,call,
                                                         [<0.10188.0>,
                                                          start_compact]}}}
[Mon, 02 May 2011 16:56:06 GMT] [info] [<0.10193.0>] Stacktrace: [{io_lib_pretty,cind_tag_tuple,7},
                                  {io_lib_pretty,while_fail,3},
                                  {io_lib_pretty,print,6},
                                  {io_lib_format,build,3},
                                  {io_lib_format,build,3},
                                  {io_lib_format,build,3},
                                  {io_lib_format,build,3},
                                  {io_lib_format,build,3}]
[Mon, 02 May 2011 16:56:06 GMT] [info] [<0.10193.0>] 127.0.0.1 - - POST /conflict_test/_compact
500

I altered my script to avoid excessively large conflict resolution
updates so everything is working fine now. (my test machine starts
taking 5 seconds at 200 conflicts resolved in one go).

I did note that when you plot the data the bulk conflict resolution
call is significantly more exponential than the setup (bulk insert 
and get with meta) calls.....

Regards,

Mike




On Thu, 2011-04-14 at 12:14 -0300, Mike Leddy wrote:
> Hi,
> 
> I have a couch node current using version 1.2.0abaa0e30-git. I decided
> to try a database maintenance task that I formerly used to use on
> couchdb 1.0.2 to purge documents in batches of 500 on a database that
> contains some 56 million documents.
> 
> >From what I can gather from the logs the call to purge_docs is timing
> out after 5 seconds and terminating.
> 
> [Wed, 13 Apr 2011 20:25:57 GMT] [info] [<0.5192.19>] 172.17.17.3 - - GET /iris/_design/tidy/_view/conflicts?limit=0
200
> [Wed, 13 Apr 2011 20:26:02 GMT] [error] [<0.5194.19>] Uncaught error in HTTP request:
{exit,
>                                                        {timeout,
>                                                         {gen_server,call,
>                                                          [<0.150.0>,
>                                                           {purge_docs,
>                                                            [{<<"1294099271F6261">>,
>                                                              [{1,
>                                                                <<181,64,95,
>                                                                  54,247,104,
>                                                                  56,34,109,
>                                                                  228,7,108,
>                                                                  250,72,57,
>                                                                  190>>}]},
>                                                             {<<"1294099281F7327">>,
>                                                              [{1,
>                                                                <<80,246,15,
>                                                                  155,182,61,
>                                                                  43,238,207,
>                                                                  43,159,136,
>                                                                  178,134,
>                                                                  137,214>>}]},
> ... removed for brevity ....
> [Wed, 13 Apr 2011 20:26:02 GMT] [info] [<0.5194.19>] Stacktrace: [{io_lib_pretty,cind_tag_tuple,7},
>                                   {io_lib_pretty,while_fail,3},
>                                   {io_lib_pretty,print,6},
>                                   {io_lib_format,build,3},
>                                   {io_lib_format,build,3},
>                                   {io_lib_format,build,3},
>                                   {io_lib_format,build,3},
>                                   {io_lib_format,build,3}]
> [Wed, 13 Apr 2011 20:26:02 GMT] [info] [<0.5194.19>] 172.17.17.3 - - POST /iris/_purge
500
> 
> I am pretty sure that this was not the case with 1.0.2. Does anyone 
> have any insight regarding what is the root cause of the problem ?
> 
> Meanwhile I'm digging through the code looking for clues....
> 
> Thanks,
> 
> Mike
> 
> 



Mime
View raw message