incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CGS <cgsmcml...@gmail.com>
Subject Re: CouchDB 1.1.1 mysteriously crashing under heavy load
Date Fri, 09 Dec 2011 09:10:32 GMT
Hi Hristo,

The problem is before what you presented from your log. For example, 
could you copy-paste information about process <0.86.0>? From the look 
of the log, it seems something (I couldn't find what from the part of 
the log you presented, but it seems that process <0.86.0> could provide 
some hints about that something) is crashing the generic server (without 
stopping the application) repeatedly and that's the reason for which you 
cannot access your documents. That explains why you are able to "fix" 
the problem by restarting CouchDB and also your message "OS process has 
timed out" (which is the effect, and not the cause of that problem). 
What is crashing your server, no idea, but the log should be able to 
provide the necessary information.

It may be also a good idea to check the space on your harddisk or the 
maximum file size allowed by your OS. But firstly, I would check what 
information that process <0.86.0> could provide.

CGS




On 12/08/2011 10:41 PM, Hristo Deshev wrote:
> Hi everyone,
>
> I moved some data from an Amazon EC2 small instance to a large one and in
> the process upgraded from CouchDB 1.1.0 to CouchDB 1.1.1. I also went with
> Erlang R14B04 instead of R14B03 (Hurray for commando updates!) and now my
> CouchDB instance seems to sometimes die when under heavy load. By "dying" I
> mean that the beam process seems to stay in memory, but the HTTP server is
> gone and no requests get served. For now I "fix" this by stopping and
> restarting the process.
>
> Here are some details on my setup. The server is running a 64-bit Ubuntu
> Server (Oneiric) Amazon EC2 image on a large instance with 2 CPU cores and
> 7.5 GB RAM. I build both Erlang and CouchDB from source. I collect log
> entries and bulk insert them in batches of up to 200 documents. I also run
> couchdb-lucene on the same host and I *think* most of the crashes happen
> when couchdb-lucene is running a tough query and is hogging the CPU or the
> HDD. I have some largish db's (~50 million documents, ~25 GB in disk
> space). I plan on splitting my dbs into smaller ones. I hope that gets me
> more responsive file access and faster full text index searches. I think my
> lucene indexes may be getting too large for that machine's memory and it
> can't serve them too well. I frequently get "OS process has timed out"
> errors when trying to query those indexes. Anyway, that shouldn't be
> crashing the core couchdb process, right?
>
> I am pasting my idea of what the relevant portion of the couchdb log file
> is below, hoping somebody could decipher something out of it. Am I correct
> in thinking that the "** Reason for termination == ** {timeout," part means
> the process is crashing since writing to or reading from a file timed out?
> Any help is greatly appreciated.
>
> Best,
> Hristo
>
> ===============
>
> [Thu, 08 Dec 2011 20:17:16 GMT] [error] [<0.78.0>] {error_report,<0.31.0>,
>                         {<0.78.0>,supervisor_report,
>                          [{supervisor,{local,couch_server_sup}},
>                           {errorContext,child_terminated},
>                           {reason,shutdown},
>                           {offender,
>                               [{pid,<0.86.0>},
>                                {name,couch_secondary_services},
>                                {mfargs,
>
> {couch_server_sup,start_secondary_services,
>                                        []}},
>                                {restart_type,permanent},
>                                {shutdown,infinity},
>                                {child_type,supervisor}]}]}}
> [Thu, 08 Dec 2011 20:17:21 GMT] [error] [<0.407.0>] ** Generic server
> <0.407.0>  terminating
> ** Last message in was delayed_commit
> ** When Server state == {db,<0.406.0>,<0.407.0>,nil,<<"1323371423957954">>,
>                              <0.404.0>,<0.408.0>,
>                              {db_header,5,204982,0,
>                                  {199491055,{204980,0}},
>                                  {199498140,204980},
>                                  {111685732,[]},
>                                  0,nil,nil,1000},
>                              204982,
>                              {btree,<0.404.0>,
>                                  {199513565,{205011,0}},
>                                  #Fun<couch_db_updater.10.19222179>,
>                                  #Fun<couch_db_updater.11.21515767>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.12.93888648>},
>                              {btree,<0.404.0>,
>                                  {199518784,205011},
>                                  #Fun<couch_db_updater.13.40165027>,
>                                  #Fun<couch_db_updater.14.82810239>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.15.104121193>},
>                              {btree,<0.404.0>,
>                                  {111685732,[]},
>                                  #Fun<couch_btree.0.23070627>,
>                                  #Fun<couch_btree.1.117278773>,
>                                  #Fun<couch_btree.2.112258129>,nil},
>                              205013,
>                              <<"database1">>,
>                              "/data/couchdb/data/database1.couch",
>                              [],[],nil,
>                              {user_ctx,null,[],undefined},
>                              #Ref<0.0.30.131014>,1000,
>                              [before_header,after_header,on_file_open],
>                              false}
> ** Reason for termination ==
> ** {timeout,
>         {gen_server,call,
>             [<0.406.0>,
>              {db_updated,
>
> {db,<0.406.0>,<0.407.0>,nil,<<"1323371423957954">>,<0.404.0>,
>                      <0.408.0>,
>                      {db_header,5,205013,0,
>                          {199513565,{205011,0}},
>                          {199518784,205011},
>                          {111685732,[]},
>                          0,nil,nil,1000},
>                      205013,
>                      {btree,<0.404.0>,
>                          {199513565,{205011,0}},
>                          #Fun<couch_db_updater.10.19222179>,
>                          #Fun<couch_db_updater.11.21515767>,
>                          #Fun<couch_btree.5.112258129>,
>                          #Fun<couch_db_updater.12.93888648>},
>                      {btree,<0.404.0>,
>                          {199518784,205011},
>                          #Fun<couch_db_updater.13.40165027>,
>                          #Fun<couch_db_updater.14.82810239>,
>                          #Fun<couch_btree.5.112258129>,
>                          #Fun<couch_db_updater.15.104121193>},
>                      {btree,<0.404.0>,
>                          {111685732,[]},
>                          #Fun<couch_btree.0.23070627>,
>                          #Fun<couch_btree.1.117278773>,
>                          #Fun<couch_btree.2.112258129>,nil},
>                      205013,
>                      <<"database1">>,
>                      "/data/couchdb/data/database1.couch",
>                      [],[],nil,
>                      {user_ctx,null,[],undefined},
>                      nil,1000,
>                      [before_header,after_header,on_file_open],
>                      false}}]}}
>
> [Thu, 08 Dec 2011 20:17:21 GMT] [error] [<0.407.0>] {error_report,<0.31.0>,
>                       {<0.407.0>,crash_report,
>
> [[{initial_call,{couch_db_updater,init,['Argument__1']}},
>                          {pid,<0.407.0>},
>                          {registered_name,[]},
>                          {error_info,
>                           {exit,
>                            {timeout,
>                             {gen_server,call,
>                              [<0.406.0>,
>                               {db_updated,
>                                {db,<0.406.0>,<0.407.0>,nil,
>                                 <<"1323371423957954">>,<0.404.0>,<0.408.0>,
>                                 {db_header,5,205013,0,
>                                  {199513565,{205011,0}},
>                                  {199518784,205011},
>                                  {111685732,[]},
>                                  0,nil,nil,1000},
>                                 205013,
>                                 {btree,<0.404.0>,
>                                  {199513565,{205011,0}},
>                                  #Fun<couch_db_updater.10.19222179>,
>                                  #Fun<couch_db_updater.11.21515767>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.12.93888648>},
>                                 {btree,<0.404.0>,
>                                  {199518784,205011},
>                                  #Fun<couch_db_updater.13.40165027>,
>                                  #Fun<couch_db_updater.14.82810239>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.15.104121193>},
>                                 {btree,<0.404.0>,
>                                  {111685732,[]},
>                                  #Fun<couch_btree.0.23070627>,
>                                  #Fun<couch_btree.1.117278773>,
>                                  #Fun<couch_btree.2.112258129>,nil},
>                                 205013,
>                                 <<"database1">>,
>                                 "/data/couchdb/data/database1.couch",
>                                 [],[],nil,
>                                 {user_ctx,null,[],undefined},
>                                 nil,1000,
>                                 [before_header,after_header,on_file_open],
>                                 false}}]}},
>                            [{gen_server,terminate,6},
>                             {proc_lib,init_p_do_apply,3}]}},
>                          {ancestors,[<0.406.0>,<0.403.0>]},
>                          {messages,[{'EXIT',<0.406.0>,shutdown}]},
>                          {links,[]},
>                          {dictionary,[]},
>                          {trap_exit,true},
>                          {status,running},
>                          {heap_size,28657},
>                          {stack_size,24},
>                          {reductions,4487709}],
>                         []]}}
> [Thu, 08 Dec 2011 20:17:22 GMT] [error] [<0.178.0>] ** Generic server
> <0.178.0>  terminating
> ** Last message in was {update_docs,<0.2027.0>,
>                             [[{doc,<<"55e776b94547442ab17b82bd1a059843">>,
>                                   {1,
>                                    [<<102,77,172,235,192,72,84,223,58,68,105,
>                                       199,153,147,196,81>>]},
>                                   {[{<<"host">>,<<"Host1">>},
>                                     {<<"time">>,1323375464000},
>                                     {<<"text">>,
>                                      <<"Some text">>},
>                                     {<<"level">>,0},
>                                     {<<"source">>,<<"source1">>},
>                                     {<<"type">>,<<"Entry1">>}]},
>                                   [],false,[]}],
>
> ...
> [[A BUNCH OF DOCS HERE]]
> ...
>
>
>                                   {[{<<"host">>,<<"Host1">>},
>                                     {<<"time">>,1323375467000},
>                                     {<<"text">>,
>                                      <<"Some text">>},
>                                     {<<"level">>,0},
>                                     {<<"source">>,<<"source1">>},
>                                     {<<"type">>,<<"Entry1">>}]},
>                                   [],false,[]}]],
>                             [],false,false}
> ** When Server state == {db,<0.177.0>,<0.178.0>,nil,<<"1323371411352029">>,
>                              <0.175.0>,<0.179.0>,
>                              {db_header,5,13636863,0,
>                                  {6776455960,{13636861,0}},
>                                  {6776479023,13636861},
>                                  {1039786,[]},
>                                  0,nil,nil,1000},
>                              13636863,
>                              {btree,<0.175.0>,
>                                  {6776455960,{13636861,0}},
>                                  #Fun<couch_db_updater.10.19222179>,
>                                  #Fun<couch_db_updater.11.21515767>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.12.93888648>},
>                              {btree,<0.175.0>,
>                                  {6776479023,13636861},
>                                  #Fun<couch_db_updater.13.40165027>,
>                                  #Fun<couch_db_updater.14.82810239>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.15.104121193>},
>                              {btree,<0.175.0>,
>                                  {1039786,[]},
>                                  #Fun<couch_btree.0.23070627>,
>                                  #Fun<couch_btree.1.117278773>,
>                                  #Fun<couch_btree.2.112258129>,nil},
>                              13636863,
>                              <<"database2">>,
>                              "/data/couchdb/data/database2.couch",
>                              [],[],nil,
>                              {user_ctx,null,[],undefined},
>                              nil,1000,
>                              [before_header,after_header,on_file_open],
>                              false}
> ** Reason for termination ==
> ** {timeout,
>         {gen_server,call,
>             [<0.177.0>,
>              {db_updated,
>
> {db,<0.177.0>,<0.178.0>,nil,<<"1323371411352029">>,<0.175.0>,
>                      <0.179.0>,
>                      {db_header,5,13636863,0,
>                          {6776455960,{13636861,0}},
>                          {6776479023,13636861},
>                          {1039786,[]},
>                          0,nil,nil,1000},
>                      13636863,
>                      {btree,<0.175.0>,
>                          {6776557909,{13637061,0}},
>                          #Fun<couch_db_updater.10.19222179>,
>                          #Fun<couch_db_updater.11.21515767>,
>                          #Fun<couch_btree.5.112258129>,
>                          #Fun<couch_db_updater.12.93888648>},
>                      {btree,<0.175.0>,
>                          {6776580448,13637061},
>                          #Fun<couch_db_updater.13.40165027>,
>                          #Fun<couch_db_updater.14.82810239>,
>                          #Fun<couch_btree.5.112258129>,
>                          #Fun<couch_db_updater.15.104121193>},
>                      {btree,<0.175.0>,
>                          {1039786,[]},
>                          #Fun<couch_btree.0.23070627>,
>                          #Fun<couch_btree.1.117278773>,
>                          #Fun<couch_btree.2.112258129>,nil},
>                      13637063,
>                      <<"database2">>,
>                      "/data/couchdb/data/database2.couch",
>                      [],[],nil,
>                      {user_ctx,null,[],undefined},
>                      #Ref<0.0.30.133811>,1000,
>                      [before_header,after_header,on_file_open],
>                      false}}]}}
>
> [Thu, 08 Dec 2011 20:17:22 GMT] [error] [<0.178.0>] {error_report,<0.31.0>,
>                       {<0.178.0>,crash_report,
>
> [[{initial_call,{couch_db_updater,init,['Argument__1']}},
>                          {pid,<0.178.0>},
>                          {registered_name,[]},
>                          {error_info,
>                           {exit,
>                            {timeout,
>                             {gen_server,call,
>                              [<0.177.0>,
>                               {db_updated,
>                                {db,<0.177.0>,<0.178.0>,nil,
>                                 <<"1323371411352029">>,<0.175.0>,<0.179.0>,
>                                 {db_header,5,13636863,0,
>                                  {6776455960,{13636861,0}},
>                                  {6776479023,13636861},
>                                  {1039786,[]},
>                                  0,nil,nil,1000},
>                                 13636863,
>                                 {btree,<0.175.0>,
>                                  {6776557909,{13637061,0}},
>                                  #Fun<couch_db_updater.10.19222179>,
>                                  #Fun<couch_db_updater.11.21515767>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.12.93888648>},
>                                 {btree,<0.175.0>,
>                                  {6776580448,13637061},
>                                  #Fun<couch_db_updater.13.40165027>,
>                                  #Fun<couch_db_updater.14.82810239>,
>                                  #Fun<couch_btree.5.112258129>,
>                                  #Fun<couch_db_updater.15.104121193>},
>                                 {btree,<0.175.0>,
>                                  {1039786,[]},
>                                  #Fun<couch_btree.0.23070627>,
>                                  #Fun<couch_btree.1.117278773>,
>                                  #Fun<couch_btree.2.112258129>,nil},
>                                 13637063,
>                                 <<"database2">>,
>                                 "/data/couchdb/data/database2.couch",
>                                 [],[],nil,
>                                 {user_ctx,null,[],undefined},
>                                 #Ref<0.0.30.133811>,1000,
>                                 [before_header,after_header,on_file_open],
>                                 false}}]}},
>                            [{gen_server,terminate,6},
>                             {proc_lib,init_p_do_apply,3}]}},
>                          {ancestors,[<0.177.0>,<0.174.0>]},
>                          {messages,
>                           [{'EXIT',<0.177.0>,shutdown},delayed_commit]},
>                          {links,[]},
>                          {dictionary,[]},
>                          {trap_exit,true},
>                          {status,running},
>                          {heap_size,121393},
>                          {stack_size,24},
>                          {reductions,83311172}],
>                         []]}}
>


Mime
View raw message