couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan Touzet <woh...@apache.org>
Subject Re: Crashing due to memory use
Date Tue, 31 Jan 2017 18:12:12 GMT
Tayven,

Thanks for the info.

How much RAM is in this node? Do you know approximately how much RAM the beam.smp process
is consuming when the oom-killer takes action? Have you changed any settings in default.ini/local.ini?

-Joan

----- Original Message -----
> From: "Tayven Bigelow" <tbigelow@mobileaccord.com>
> To: "Jan Lehnardt" <jan@apache.org>, user@couchdb.apache.org
> Cc: "Nick Becker" <nick@mobileaccord.com>
> Sent: Tuesday, January 31, 2017 12:49:11 PM
> Subject: Re: Crashing due to memory use
> 
> Hey Jan!
> 
> 
> You'd be correct on the multiple postings, weren't sure they were
> being posted.
> 
> We currently run this in production on cloudant and were hoping to
> have a backup utilizing the new couchdb 2.0. We are able to
> consistently replicate.
> 
> The memory leak happens when we kick off a new view.
> beam.smp terminates on a OOM by the kernel.
> 
> Checking /var/log/syslog shows:
> Jan 31 18:32:44 couchdb7 kernel: [594086.565577] Out of memory: Kill
> process 23731 (beam.smp) score 961 or sacrifice child
> Jan 31 18:32:44 couchdb7 kernel: [594086.565622] Killed process 23773
> (memsup) total-vm:4228kB, anon-rss:12kB, file-rss:0kB
> Jan 31 18:32:44 couchdb7 kernel: [594086.569327] Out of memory: Kill
> process 23731 (beam.smp) score 961 or sacrifice child
> Jan 31 18:32:44 couchdb7 kernel: [594086.569392] Killed process 23731
> (beam.smp) total-vm:126594220kB, anon-rss:64708732kB, file-rss:0kB
> Jan 31 18:32:56 couchdb7 monit[9113]: 'couchdb' process is not
> running
> 
> The couchdb.log file at the time of crash contains:
> 
> 1981936-[debug] 2017-01-31T17:16:35.355774Z
> couchdb@couchdb7.geopoll.com <0.9036.262> -------- OS Process
> #Port<0.63437> Input  ::
> ["map_doc",{"_id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","_rev":"5-b90c6c87a0a48e647528a1b3c5bfe12b","MetaData":{"PollId":"147402","Car
> rierId":"25504","UserPollStateId":"3362564708"},"UserId":"1002449829201","CreateDate":"2015-11-23T06:42:40.0285675Z","LastModifiedDate":"2015-11-23T06:43:07.5474967Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6
> ","CallbackUri":"http://de-geopoll-1:8645/billingcallback","CallbackSent":true,"Activities":[{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0297329Z","State":"PROCESSING"},{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0307329Z","State":"SUCCESS"}],"Currency":"US_Dol
> lar_USD","ConsumerIdentifier":"250025308","ToBeBilledIdentifier":"255763398389","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":0.11,"BillProcessingState":"SUCCESS","BillingProvider":"TRANSFERTO","NextProcessingTime":"0001-01-01T00:00:00","NextProces
> singTimeAsLong":0,"Id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","CreatedDate":"2015-11-23T06:42:40.0285675Z","ModifiedDate":"2015-11-23T06:43:07.5474967Z","Type":"Bill"}]
> 1981937-[debug] 2017-01-31T17:16:35.355856Z
> couchdb@couchdb7.geopoll.com <0.11910.262> -------- OS Process
> #Port<0.63508> Output ::
> [[[["GeoPoll","8921801"],null]],[[["77802","PRETUPS"],null]],[[["77802","PRETUPS","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","SUCC
> ESS","2014","03","05"],null],[["77802","ALL","SUCCESS","2014","03","05"],null],[["77802","PRETUPS","ALL","2014","03","05"],null],[["ALL","ALL","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","ALL","2014","03","05"],null],[["77802","ALL","ALL","2014","03","05"],null],
> [["ALL","ALL","ALL","2014","03","05"],null]],[[["77802","2014","3","05"],null]],[["254788760292",null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","SUCCESS","2014-03-05T12:48:59.5664722Z"],null
> ]],[[["PRETUPS","25402","SUCCESS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS"],null]],[["254788760292",null]],[["1000374925501",null]],[[[2014,3,5,"PRE
> TUPS","SUCCESS"],null]]]
> 1981938-[debug] 2017-01-31T17:16:35.356012Z
> couchdb@couchdb7.geopoll.com <0.9036.262> -------- OS Process
> #Port<0.63437> Output ::
> [[[["147402","TRANSFERTO","SUCCESS"],null]],[[["TRANSFERTO","SUCCESS","2015-11-23T06:43:07.5474967Z"],null]],[[["TRANSFERTO","SUCCESS","0001
> -01-01T00:00:00"],null]]]
> 1981939-[debug] 2017-01-31T17:16:35.356108Z
> couchdb@couchdb7.geopoll.com <0.11910.262> -------- OS Process
> #Port<0.63508> Input  ::
> ["map_doc",{"_id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","_rev":"3-832e63f45b45d5e3008b7e7bbe2b7392","MetaData":{"PollId":"77802","CarrierId":"25402","UserPollStateId":"3256532401","CarrierName":"Airtel-Kenya","Pretups.Version":"5.1","Pretups.Uri":"https://41.223.56.108:8093/pretups/C2SReceiver","Auth.Login":"pretups","Auth.Password":"0971500a350af5c3d1c0b12221a0558c","Auth.GatewayCode":"EXTGW","Auth.GatewayType":"EXTGW","Auth.ServicePort":"190","Auth.SourceType":"EXT","Cmd.ExtNwCode":"KE","Cmd.Msisdn":"732810086","Cmd.Pin":"2549","Cmd.Login":"","Cmd.Password":"","Cmd.ExtCode":"2468","CountryCode":"254","MobilePhoneLength":"9","TestMobileNumber":"254733621719","Currency":"KES"},"UserId":"1000277123401","CreateDate":"2014-03-05T13:45:49.6889321Z","LastModifiedDate":"2014-03-05T13:46:14.8050931Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CallbackUri":"http://uk-app-3:8645/billingcallback","Activities":[{"CreateDate":"2014-03-05T13:46:14.2902898Z","State":"PROCESSING"},{"MetaData":{"Type":"EXRCTRFRESP","Txnid":"R140305.1648.210003","Txnstatus":"200","Date":"05/03/2014
> 16:48:40","Extrefnum":"","Data":null},"CreateDate":"2014-03-05T13:46:14.2912898Z","State":"SUCCESS"}],"Currency":"Kenyan_Shilling_KES","ConsumerIdentifier":"8963201","ToBeBilledIdentifier":"254735960469","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":43.0,"BillProcessingState":"SUCCESS","BillingProvider":"PRETUPS","NextProcessingTime":"0001-01-01T00:00:00","NextProcessingTimeAsLong":0,"Id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CreatedDate":"2014-03-05T13:45:49.6889321Z","ModifiedDate":"2014-03-05T13:46:14.8050931Z","Type":"Bill"}]
> 1981940:[debug] 2017-01-31T17:32:57.300061Z
> couchdb@couchdb7.geopoll.com <0.111.0> -------- Supervisor
> couch_log_sup started couch_log_monitor:start_link() at pid
> <0.114.0>
> 1981941:[debug] 2017-01-31T17:32:57.301585Z
> couchdb@couchdb7.geopoll.com <0.111.0> -------- Supervisor
> couch_log_sup started config_listener_mon:start_link(couch_log_sup,
> nil) at pid <0.115.0>
> 1981942:[info] 2017-01-31T17:32:57.301605Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application couch_log
> started on node 'couchdb@couchdb7.geopoll.com'
> 1981943:[debug] 2017-01-31T17:32:57.302447Z
> couchdb@couchdb7.geopoll.com <0.119.0> -------- Supervisor
> folsom_sup started folsom_sample_slide_sup:start_link() at pid
> <0.120.0>
> 1981944:[debug] 2017-01-31T17:32:57.303229Z
> couchdb@couchdb7.geopoll.com <0.119.0> -------- Supervisor
> folsom_sup started folsom_meter_timer_server:start_link() at pid
> <0.121.0>
> 1981945:[debug] 2017-01-31T17:32:57.303979Z
> couchdb@couchdb7.geopoll.com <0.119.0> -------- Supervisor
> folsom_sup started folsom_metrics_histogram_ets:start_link() at pid
> <0.122.0>
> 1981946:[info] 2017-01-31T17:32:57.304074Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application folsom
> started on node 'couchdb@couchdb7.geopoll.com'
> 1981947:[debug] 2017-01-31T17:32:57.325716Z
> couchdb@couchdb7.geopoll.com <0.126.0> -------- Supervisor
> couch_stats_sup started couch_stats_aggregator:start_link() at pid
> <0.127.0>
> 1981948:[debug] 2017-01-31T17:32:57.326519Z
> couchdb@couchdb7.geopoll.com <0.126.0> -------- Supervisor
> couch_stats_sup started couch_stats_process_tracker:start_link() at
> pid <0.177.0>
> 1981949:[info] 2017-01-31T17:32:57.326595Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application
> couch_stats started on node 'couchdb@couchdb7.geopoll.com'
> 1981950:[info] 2017-01-31T17:32:57.326673Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application khash
> started on node 'couchdb@couchdb7.geopoll.com'
> 1981951:[debug] 2017-01-31T17:32:57.330327Z
> couchdb@couchdb7.geopoll.com <0.182.0> -------- Supervisor
> couch_event_sup2 started couch_event_server:start_link() at pid
> <0.183.0>
> 1981952:[debug] 2017-01-31T17:32:57.331211Z
> couchdb@couchdb7.geopoll.com <0.185.0> -------- Supervisor
> couch_event_os_sup started
> config_listener_mon:start_link(couch_event_os_sup, nil) at pid
> <0.186.0>
> 1981953:[debug] 2017-01-31T17:32:57.331268Z
> couchdb@couchdb7.geopoll.com <0.182.0> -------- Supervisor
> couch_event_sup2 started couch_event_os_sup:start_link() at pid
> <0.185.0>
> 1981954:[info] 2017-01-31T17:32:57.331367Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application
> couch_event started on node 'couchdb@couchdb7.geopoll.com'
> 1981955:[debug] 2017-01-31T17:32:57.334167Z
> couchdb@couchdb7.geopoll.com <0.190.0> -------- Supervisor
> ibrowse_sup started ibrowse:start_link() at pid <0.191.0>
> 1981956:[info] 2017-01-31T17:32:57.334239Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application ibrowse
> started on node 'couchdb@couchdb7.geopoll.com'
> 1981957:[debug] 2017-01-31T17:32:57.335727Z
> couchdb@couchdb7.geopoll.com <0.196.0> -------- Supervisor ioq_sup
> started config_listener_mon:start_link(ioq_sup, nil) at pid
> <0.197.0>
> 1981958:[debug] 2017-01-31T17:32:57.336685Z
> couchdb@couchdb7.geopoll.com <0.196.0> -------- Supervisor ioq_sup
> started ioq:start_link() at pid <0.198.0>
> 1981959:[info] 2017-01-31T17:32:57.336756Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application ioq
> started on node 'couchdb@couchdb7.geopoll.com'
> 1981960:[info] 2017-01-31T17:32:57.336829Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application mochiweb
> started on node 'couchdb@couchdb7.geopoll.com'
> 1981961:[info] 2017-01-31T17:32:57.336899Z
> couchdb@couchdb7.geopoll.com <0.7.0> -------- Application oauth
> started on node 'couchdb@couchdb7.geopoll.com'
> 1981962:[info] 2017-01-31T17:32:57.340965Z
> couchdb@couchdb7.geopoll.com <0.204.0> -------- Apache CouchDB 2.0.0
> is starting.
> 
> 
> 
> For the Large database it would happen when we kicked off 1 out the
> 39 views on the database, however on the smaller database I would
> have to kick off all 5 views within the database.
> The large database has 9 design documents, with the smaller database
> having only 1.
> The views are all JS.
> Other than Fail2Ban, UFW, Logwatch, LogRotate, Monit and Zabbix-Agent
> there is nothing else running on the server. Except when we build it
> with Dreyfus and Clouseau.
> 
> Example of one of the larger Design documents:
> {
>   "_id": "_design/bills",
>   "_rev": "4-b0ed6cf8f871391add5004f7e67bc3a8",
>   "language": "javascript",
>   "auto_update": true,
>   "views": {
>     "by_bill_date_and_bill_provider": {
>       "map": "function(doc) {\n  if (doc._id.indexOf(\"bill-\") ===
>       0){\n      var date = new
>       Date(doc.CreatedDate?doc.CreatedDate:doc.CreateDate);\n
>            var year = date.getFullYear();\n      var month =
>       (date.getMonth() + 1);\n      var day = date.getDate();\n
>            emit([year, month, day, doc.BillingProvider,
>       doc.BillProcessingState], null);\n  }\n}",
>       "reduce": "_count"
>     },
>     "by_poll_id_and_bill_date": {
>       "map": "function(doc) {\n  if ((doc._id.indexOf(\"bill-\") ===
>       0) && doc.MetaData.PollId){\n    var date = new
>       Date(doc.CreateDate);\n    var year =
>       date.getFullYear().toString();\n    var month =
>       (date.getMonth() + 1).toString();\n    var day =
>       date.getDate().toString();\n    if (day.length == 1){\n
>            day = \"0\" + day;\n    }\n\n
>          emit([doc.MetaData.PollId, year, month, day], null);\n
>        }\n}",
>       "reduce": "_count"
>     },
>   }
> }
> 
> Example of a doc within the larger database:
> {
>   "_id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947",
>   "_rev": "5-b40e00a54059c6c79004c0afd584fc60",
>   "MetaData": {
>     "PollId": "1844608",
>     "CarrierId": "2701",
>     "UserPollStateId": "12614468108"
>   },
>   "UserId": "1002196088104",
>   "CreateDate": "2017-01-31T07:20:58",
>   "LastModifiedDate": "2017-01-31T07:21:14.2473555Z",
>   "SystemSource": "GeoPoll",
>   "AttemptCount": 1,
>   "BillingIdentifier": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947",
>   "CallbackUri": "http://XXXXXXXXXXX:8645/billingcallback",
>   "CallbackSent": true,
>   "Activities": [
>     {
>       "MetaData": {},
>       "CreateDate": "2017-01-31T07:21:11.182049Z",
>       "State": "PROCESSING"
>     },
>     {
>       "MetaData": {
>         "VoucherPin": "",
>         "OrderRef": "113234210",
>         "TicketNumber": "",
>         "BoxNumber": "",
>         "BatchNumber": "",
>         "ProcessingTime": "3064.3064"
>       },
>       "CreateDate": "2017-01-31T07:21:11.1820491Z",
>       "State": "SUCCESS"
>     }
>   ],
>   "Currency": "South_African_Rand_ZAR",
>   "ConsumerIdentifier": "XXXXXXXXXXXX",
>   "ToBeBilledIdentifier": "XXXXXXXXXXXX",
>   "BillType": "Carrier",
>   "BillProcessingStateAsString": "SUCCESS",
>   "Value": 2,
>   "BillProcessingState": "SUCCESS",
>   "BillingProvider": "VODACOMSA",
>   "NextProcessingTime": "0001-01-01T00:00:00",
>   "NextProcessingTimeAsLong": 0,
>   "FinalProcessingTime": 0,
>   "LastSubmittedDate": "0001-01-01T00:00:00",
>   "Id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947",
>   "CreatedDate": "2017-01-31T07:20:58",
>   "ModifiedDate": "2017-01-31T07:21:14.2473555Z",
>   "Type": "Bill"
> }
> 
> Docs usually go through 4-5 updates before they are finalized.
> Within the larger database we have 16,201,998 docs totaling 23 GB. No
> attachments.
> 
> No other traffic besides a single user (me), including replication.
> No other patterns that stand out (to me at least). The memory usage
> grows and grows before eventually consuming the Swap space and
> running into a OOM kill.
> 
> The other 11 nodes are affected.
> 
> Thanks for your assistance!!
> 
> -Tayven
> 
> ________________________________
> From: Jan Lehnardt <jan@apache.org>
> Sent: Tuesday, January 31, 2017 4:38 AM
> To: user@couchdb.apache.org
> Cc: Tayven Bigelow; Nick Becker
> Subject: Re: Crashing due to memory use
> 
> Heya Nick and Tayven,
> 
> I assume you posted multiple times because your mails didn’t show up
> immediately due to mailing list moderation.
> 
> You are correct that the database size and hardware configuration
> should not cause any issues.
> 
> Can you explain the scenario a little better?
> 
> Is the memory leak happening when building your views for the first
> time?
> 
> Does beam.smp terminate on its own or is it an OOM kill from the
> kernel?
> 
> How many views do you have?
> 
> How many design docs?
> 
> JS views or Erlang views?
> 
> Is there anything else running on these nodes?
> 
> Can you share your view code?
> 
> Can you share your couch.log?
> 
> Can explain your document structure (total bytes, number of fields,
> attachments etc.).
> 
> Can you describe your traffic pattern?
> 
> Can you describe any other pattern that leads up to the memory leak?
> 
> Does this happen on all nodes? If not, is there anything special
> about the affected nodes?
> 
> 
> (shameless plug, if you require professional assistance, my email
> footer has contact information)
> 
> 
> > On 31 Jan 2017, at 00:15, Tayven Bigelow
> > <tbigelow@mobileaccord.com> wrote:
> >
> > Hey Guys!
> >
> >
> > Been using a CouchDB 2.0 12 server cluster for a while now and have
> > noticed a memory leak that causes beam.smp to crash while
> > populating Views.
> >
> > The q/r/w/n is set up as:
> >
> > [cluster]
> > q=12
> > r=2
> > w=2
> > n=3
> >
> > As far as I know the server should be able to handle the load as it
> > has 64GB RAM with a Core i7 6700. We are running ubuntu 16.04.1.
> >
> > The Database is 16.5 GB in size.
> >
> >
> > I've also attempted to run 2.0 with Dreyfus and Clouseau and ran
> > into the same issue with a Database size of 7.8MB.
> >
> >
> > I've noted in previous releases some people have ran into similar
> > memory issues with beam.smp and increasing the open file limit was
> > part of the resolution. We've increased the nofile limit for the
> > couchdb user to 4096 (as found here:
> > https://wiki.apache.org/couchdb/Performance ) with no luck.
> Performance - Couchdb
> Wiki<https://wiki.apache.org/couchdb/Performance>
> wiki.apache.org
> With up to tens of thousands of documents you will generally find
> CouchDB to perform well no matter how you write your code. Once you
> start getting into ...
> 
> 
> 
> >
> >
> > Nothing out of the ordinary is thrown in the logs. The only way to
> > catch it is by watching memory use.
> >
> >
> > I'm wondering if theres a configuration/setting somewhere that I am
> > missing that could be causing this issue.
> >
> >
> > Thanks!
> >
> > Tayven
> >
> >
> >
> > All information in this message is confidential and may be legally
> > privileged. If you are not the intended recipient, notify the
> > sender immediately and destroy this email.
> 
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> Professional Support for Apache CouchDB™ -
> Neighbourhood<https://neighbourhood.ie/couchdb-support/>
> neighbourhood.ie
> Apache CouchDB is the first choice for geographically distributed
> database solutions. From cross data-centre clusters to offline-first
> mobile and web solutions ...
> 
> 
> 
> Email: couchdb@neighbourhood.ie
> 
> 
> All information in this message is confidential and may be legally
> privileged. If you are not the intended recipient, notify the sender
> immediately and destroy this email.
> 

Mime
View raw message