Is one of those "other processes" called "heart", by any chance? B. On 19 Aug 2012, at 21:00, Tim Tisdall wrote: > stderr shows this when I hit an empty response: > > heart_beat_kill_pid = 17700 > heart_beat_timeout = 11 > Killed > heart: Sun Aug 19 18:23:54 2012: Erlang has closed. > heart: Sun Aug 19 18:23:55 2012: Executed "/usr/local/bin/couchdb -k". > Terminating. > heart_beat_kill_pid = 18390 > heart_beat_timeout = 11 > Killed > heart: Sun Aug 19 18:35:18 2012: Erlang has closed. > heart: Sun Aug 19 18:35:18 2012: Executed "/usr/local/bin/couchdb -k". > Terminating. > heart_beat_kill_pid = 18438 > heart_beat_timeout = 11 > > > So, it looks like the OS is killing the process because it's running > out of memory. I can see in syslog that the oom-killer is killing > processes at exactly the same time. What's strange, though, is > there's no mention of oom-killer killing couchdb. There's only > mentions of other processes being killed. > > > On Sun, Aug 19, 2012 at 8:15 AM, Robert Newson wrote: >> 3.9Mb isn't large enough to trigger memory issues on its own on a node with 380M of ram. Can you use 'top' or 'atop' to see what memory consumption was like before the crash? Erlang/OTP does usually report out of memory errors when it crashes (to stderr which doesn't hit the .log file, iirc). >> >> B. >> >> >> On 19 Aug 2012, at 11:30, CGS wrote: >> >>> On Sat, Aug 18, 2012 at 9:15 PM, Tim Tisdall wrote: >>> >>>> So, it's possible that couchdb is running out of memory when >>>> processing a large JSON file? >>> >>> >>> Definitely. >>> >>> >>>> From my last example I gave, the JSON >>>> file is 3.9Mb which I didn't think was too big, but I do only have >>>> ~380Mb of RAM. However, I am able to do several thousand similar >>>> _bulk_doc updates of around the same size before I see the error... >>>> are memory leaks possible with erlang? >>> >>> >>> It looks more like a RAM limitation per process. There may be a memory >>> leak, but I am not sure. >>> >>> >>>> Also, why is there nothing in >>>> the logs about running out of memory? (shouldn't that be something >>>> the program is able to detect?) >>>> >>> >>> It seems CouchDB doesn't catch this type of warnings. >>> >>> >>>> >>>> I switched over to using _bulk_doc's because the database grew way too >>>> fast if I did only 1 update at a time. I'm doing about 5000 - 200000 >>>> document updates each time I run my script so I've been doing the >>>> updates in batches of 150. >>>> >>> >>> I don't know about your requirements, but I remember a project in which I >>> created a round-robin to buffer and feed the docs to CouchDB. In that >>> project I had to find an optimization in between the number of slices and >>> the number of docs I could store for being able to feed to CouchDB in order >>> to minimize the insertion time. Maybe this idea will help you in your >>> project as well. >>> >>> CGS >>> >>> >>> >>>> >>>> -Tim >>>> >>>> On Fri, Aug 17, 2012 at 9:33 PM, CGS wrote: >>>>> I managed to reproduce the error: >>>>> >>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] OAuth Params: [] >>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [debug] [<0.114.0>] Include Doc: >>>>> <<"_design/_replicator">> {1, >>>>> >>>> <<91,250,44,153, >>>>> >>>> 238,254,43,46, >>>>> >>>>> 180,150,45,181, >>>>> >>>>> 10,163,207,212>>} >>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [info] [<0.32.0>] Apache CouchDB has >>>>> started on http://0.0.0.0:5984/ >>>>> >>>>> ...and I think I identified also the problem: too long/large JSON. >>>>> >>>>> Here is how to reproduce the error: >>>>> >>>>> 1. CouchDB error level: debug >>>>> 2. an extra-huge JSON file: echo -n "{\"docs\":[{\"key\":\"1\"}" > >>>>> my_json.json && for var in $(seq 2 2000000) ; do echo -n >>>>> ",{\"key\":\"${var}\"}" >> my_json.json ; done && echo -n "]}" >> >>>>> my_json.json >>>>> 3. attempting to send it with curl (requires to have database "test" >>>>> already existing and preferably empty): >>>>> >>>>> curl -X POST http://127.0.0.7:5984/test/_bulk_docs -H 'Content-Type: >>>>> application/json' -d @my_json.json > /dev/null >>>>> % Total % Received % Xferd Average Speed Time Time Time >>>>> Current >>>>> Dload Upload Total Spent Left >>>>> Speed >>>>> 100 33.2M 0 0 100 33.2M 0 856k 0:00:39 0:00:39 --:--:-- >>>>> 0 >>>>> curl: (52) Empty reply from server >>>>> >>>>> Erlang shell report for the same problem: >>>>> >>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 === >>>>> alarm_handler: {set,{system_memory_high_watermark,[]}} >>>>> >>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 === >>>>> alarm_handler: {set,{process_memory_high_watermark,<0.149.0>}} >>>>> /usr/local/lib/erlang/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has >>>>> closed.Erlang has closed >>>>> >>>>> Tim, try to split your JSON in smaller pieces. Bulk operations tend to >>>> use >>>>> a lot of memory. >>>>> >>>>> The _design/_replicator error comes with multipart file set by cURL by >>>>> default in such cases. Once a second piece is sent toward the server, the >>>>> crash is registered. The first piece report looks like: >>>>> >>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] 'POST' >>>> /test/_bulk_docs >>>>> {1,1} from "127.0.0.1" >>>>> >>>>> I hope this info may help. >>>>> >>>>> CGS >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Aug 17, 2012 at 7:30 PM, Tim Tisdall wrote: >>>>> >>>>>> Okay, so it always states that _replicator line any time I manually >>>>>> restart the server. I think it's just a standard logging message when >>>>>> the level is set to "debug". >>>>>> >>>>>> On Fri, Aug 17, 2012 at 1:13 PM, Tim Tisdall wrote: >>>>>>> No. All my ids (except for design documents) are strings containing >>>>>>> integers. Also, none of my design documents are called anything like >>>>>>> "_replicator". The only thing with that name is in the _replicator >>>>>>> database which I'm not doing anything with. >>>>>>> >>>>>>> Why does it say "Include Doc"? And what's that series of numbers >>>>>>> afterwards? That log message seems to consistently occur just before >>>>>>> the log message about the server starting. Is that just a normal >>>>>>> message you get when the server restarts and you have logging set to >>>>>>> "debug"? >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 17, 2012 at 1:03 PM, Robert Newson >>>>>> wrote: >>>>>>>> >>>>>>>> Does app_stats_test contain a document called _design/_replicator or >>>> is >>>>>> a document with that id in the body of your bulk post? >>>>>>>> >>>>>>>> B. >>>>>>>> >>>>>>>> On 17 Aug 2012, at 17:52, Tim Tisdall wrote: >>>>>>>> >>>>>>>>> I do have UTF8 characters in the JSON, but isn't that acceptable? I >>>>>>>>> have no problem retrieving UTF8 encoded content from the server and >>>> I >>>>>>>>> have a bunch of it saved in there already too. >>>>>>>>> >>>>>>>>> On Fri, Aug 17, 2012 at 10:35 AM, CGS wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Do you have somehow special characters (non-latin1 ones) in your >>>>>> JSON? That >>>>>>>>>> error looks strangely close to trying to transform a list of >>>> unicode >>>>>>>>>> characters into a binary. I might be wrong though. >>>>>>>>>> >>>>>>>>>> CGS >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 17, 2012 at 4:09 PM, Tim Tisdall >>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I thought I added that to the init script before when you >>>> mentioned >>>>>>>>>>> it, but I checked and it was gone. I added a "cd ~couchdb" in >>>> there >>>>>>>>>>> and now I no longer get eaccess errors, but the process still >>>> crashes >>>>>>>>>>> with very little information: >>>>>>>>>>> >>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] 'POST' >>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1" >>>>>>>>>>> Headers: [{'Accept',"*/*"}, >>>>>>>>>>> {'Content-Length',"3902444"}, >>>>>>>>>>> {'Content-Type',"application/json"}, >>>>>>>>>>> {'Host',"localhost:5984"}] >>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] OAuth >>>> Params: [] >>>>>>>>>>> [Fri, 17 Aug 2012 14:02:16 GMT] [debug] [<0.115.0>] Include Doc: >>>>>>>>>>> <<"_design/_replicator">> {1, >>>>>>>>>>> >>>>>>>>>>> <<91,250,44,153, >>>>>>>>>>> >>>>>>>>>>> 238,254,43,46, >>>>>>>>>>> >>>>>>>>>>> 180,150,45,181, >>>>>>>>>>> >>>>>>>>>>> 10,163,207,212>>} >>>>>>>>>>> [Fri, 17 Aug 2012 14:02:17 GMT] [info] [<0.32.0>] Apache CouchDB >>>> has >>>>>>>>>>> started on http://127.0.0.1:5984/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Someone mentioned seeing the JSON that I'm submitting... Wouldn't >>>>>>>>>>> mal-formed JSON throw an error? >>>>>>>>>>> >>>>>>>>>>> -Tim >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 17, 2012 at 4:33 AM, Robert Newson < >>>> rnewson@apache.org> >>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I've seen couchdb start despite the eacces errors before and >>>>>> tracked it >>>>>>>>>>> down to the current working directory setting. It seems that the >>>> cwd >>>>>> is >>>>>>>>>>> searched first, and then erlang looks elsewhere. So, if our >>>> startup >>>>>> script >>>>>>>>>>> doesn't change it to somewhere that the couchdb user can read, you >>>>>> get >>>>>>>>>>> spurious eacces errors. >>>>>>>>>>>> >>>>>>>>>>>> Don't ask me how I know this. >>>>>>>>>>>> >>>>>>>>>>>> B. >>>>>>>>>>>> >>>>>>>>>>>> On 16 Aug 2012, at 20:19, Tim Tisdall wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Paul, did you ever solve the eaccess problem you had described >>>>>> here: >>>>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201106.mbox/%3C4E0B304F.5080109@lymegreen.co.uk%3E >>>>>>>>>>>>> I found that post from doing Google searches for my issue. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Aug 14, 2012 at 11:41 PM, Paul Davis >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 9:38 PM, Tim Tisdall < >>>> tisdall@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> I'm still having problems with couchdb, but I'm trying out >>>>>> different >>>>>>>>>>>>>>> things to see if I can narrow down what the problem is... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I stopped using fsockopen() in PHP and am using curl now to >>>>>> hopefully >>>>>>>>>>>>>>> be able to see more debugging info. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I get an empty response when sending a POST to _bulk_docs. >>>> From >>>>>> the >>>>>>>>>>>>>>> couch logs it seems like the server restarts in the middle of >>>>>>>>>>>>>>> processing the request. Here's what I have in my logs: (I >>>> have >>>>>> no >>>>>>>>>>>>>>> idea what the _replicator portion is about there, I'm >>>> currently >>>>>> not >>>>>>>>>>>>>>> using it) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] 'POST' >>>>>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1" >>>>>>>>>>>>>>> Headers: [{'Accept',"*/*"}, >>>>>>>>>>>>>>> {'Content-Length',"2802300"}, >>>>>>>>>>>>>>> {'Content-Type',"application/json"}, >>>>>>>>>>>>>>> {'Host',"localhost:5984"}] >>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] OAuth >>>>>> Params: [] >>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [debug] [<0.115.0>] Include >>>> Doc: >>>>>>>>>>>>>>> <<"_design/_replicator">> {1, >>>>>>>>>>>>>>> >>>>>>>>>>> <<91,250,44,153, >>>>>>>>>>>>>>> >>>>>>>>>>> 238,254,43,46, >>>>>>>>>>>>>>> >>>>>>>>>>> 180,150,45,181, >>>>>>>>>>>>>>> >>>>>>>>>>> 10,163,207,212>>} >>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [info] [<0.32.0>] Apache >>>> CouchDB >>>>>> has >>>>>>>>>>>>>>> started on http://127.0.0.1:5984/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In my code logs I have the following by running curl in >>>> verbose >>>>>> mode: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * About to connect() to localhost port 5984 (#0) >>>>>>>>>>>>>>> * Trying 127.0.0.1... * connected >>>>>>>>>>>>>>> * Connected to localhost (127.0.0.1) port 5984 (#0) >>>>>>>>>>>>>>>> POST /app_stats_test/_bulk_docs HTTP/1.0 >>>>>>>>>>>>>>> Host: localhost:5984 >>>>>>>>>>>>>>> Accept: */* >>>>>>>>>>>>>>> Content-Type: application/json >>>>>>>>>>>>>>> Content-Length: 2802300 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * Empty reply from server >>>>>>>>>>>>>>> * Connection #0 to host localhost left intact >>>>>>>>>>>>>>> curl error: 52 : Empty reply from server >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I also tried using HTTP/1.1 and I get an empty response after >>>>>>>>>>>>>>> receiving only a "100 Continue", but the end result appears >>>> the >>>>>> same. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Tim >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you have a request that triggers this, a good way to catch >>>> it >>>>>> is >>>>>>>>>>> like such: >>>>>>>>>>>>>> >>>>>>>>>>>>>> $ /usr/local/bin/couchdb # or however you start it >>>>>>>>>>>>>> $ ps ax | grep beam.smp # Get the pid of couchdb >>>>>>>>>>>>>> $ gdb >>>>>>>>>>>>>> (gdb) attach $pid # Where $pid was just found with ps. >>>> Might >>>>>>>>>>>>>> throw up an access prompt >>>>>>>>>>>>>> (gdb) continue >>>>>>>>>>>>>> # At this point, run the command that makes couchdb reboot >>>>>> in a >>>>>>>>>>>>>> # different console. If it happens you should see Gdb >>>> notice >>>>>> the >>>>>>>>>>>>>> # error. Then the following: >>>>>>>>>>>>>> (gdb) t a a bt >>>>>>>>>>>>>> >>>>>>>>>>>>>> And that should spew out a bunch of stack traces. If you can >>>> get >>>>>> that >>>>>>>>>>>>>> we should be able to fairly specifically narrow down the issue. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> >>