incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pasi Eronen ...@iki.fi>
Subject CouchDB 1.0.2 errors under load
Date Fri, 25 Feb 2011 09:18:47 GMT
Hi,

I had a big batch job (inserting 10M+ documents and generating views for them)
that ran just fine for about 6 hours, but then I got this error:

[Thu, 24 Feb 2011 19:42:57 GMT] [error] [<0.276.0>] ** Generic server
<0.276.0> terminating
** Last message in was delayed_commit
** When Server state == {db,<0.275.0>,<0.276.0>,nil,<<"1298547642391489">>,
                            <0.273.0>,<0.277.0>,
                            {db_header,5,739828,0,
                                {4778613011,{663866,0}},
                                {4778614954,663866},
                                nil,0,nil,nil,1000},
                            739828,
                            {btree,<0.273.0>,
                                {4778772755,{663866,0}},
                                #Fun<couch_db_updater.7.10053969>,
                                #Fun<couch_db_updater.8.35220795>,
                                #Fun<couch_btree.5.124754102>,
                                #Fun<couch_db_updater.9.107593676>},
                            {btree,<0.273.0>,
                                {4778774698,663866},
                                #Fun<couch_db_updater.10.30996817>,
                                #Fun<couch_db_updater.11.96515267>,
                                #Fun<couch_btree.5.124754102>,
                                #Fun<couch_db_updater.12.117826253>},
                            {btree,<0.273.0>,nil,
                                #Fun<couch_btree.0.83553141>,
                                #Fun<couch_btree.1.30790806>,
                                #Fun<couch_btree.2.124754102>,nil},
                            739831,<<"foo_replication_tmp">>,
                            "/data/foo/couchdb-data/foo_replication_tmp.couch",
                            [],[],nil,
                            {user_ctx,null,[],undefined},
                            #Ref<0.0.1793.256453>,1000,
                            [before_header,after_header,on_file_open],
                            false}
** Reason for termination ==
** {{badmatch,{error,emfile}},
    [{couch_file,sync,1},
     {couch_db_updater,commit_data,2},
     {couch_db_updater,handle_info,2},
     {gen_server,handle_msg,5},
     {proc_lib,init_p_do_apply,3}]}

(+lot of other messages with the same timestamp -- can send if they're useful)

Exactly at this time, the client got HTTP 500 status code; the request
was a bulk get (POST /foo_replication_tmp/_all_docs?include_docs=true).

Just before this request, the client had made a PUT (updating an existing
document) that got 200 status code, but apparently was not successfully
committed to the disk (I'm using "delayed_commits=true" - for my app,
this is just fine). The client had received the new _rev value, but when
it tried updating the same document a minute later, there was a conflict
(and it's not possible that somebody else updated this same document).

About four hours later, there was a different error ("accept_failed"
sounds like some temporary problem with sockets?):

[Thu, 24 Feb 2011 23:55:42 GMT] [error] [<0.20693.4>] {error_report,<0.31.0>,
              {<0.20693.4>,std_error,
               [{application,mochiweb},
                "Accept failed error","{error,emfile}"]}}

[Thu, 24 Feb 2011 23:55:42 GMT] [error] [<0.20693.4>] {error_report,<0.31.0>,
    {<0.20693.4>,crash_report,
     [[{initial_call,{mochiweb_socket_server,acceptor_loop,['Argument__1']}},
       {pid,<0.20693.4>},
       {registered_name,[]},
       {error_info,
           {exit,
               {error,accept_failed},
               [{mochiweb_socket_server,acceptor_loop,1},
                {proc_lib,init_p_do_apply,3}]}},
       {ancestors,
           [couch_httpd,couch_secondary_services,couch_server_sup,<0.32.0>]},
       {messages,[]},
       {links,[<0.106.0>]},
       {dictionary,[]},
       {trap_exit,false},
       {status,running},
       {heap_size,233},
       {stack_size,24},
       {reductions,200}],
      []]}}

(+lots of other messages within the next couple of minutes)

The same error occured once more, about four hours later.

I'm quite new to CouchDB, so I'd appreciate any help in interpreting
what these error messages mean. (BTW, are these something I should
report as bugs in JIRA? I can do that, but I'd like to at least understand
which parts of the error messages are actually relevant here :-)

I'm running CouchDB 1.0.2 with Erlang R14B on 64-bit RHEL 5.6.

Best regards,
Pasi

Mime
View raw message