Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C060DAAB for ; Mon, 20 Aug 2012 15:47:51 +0000 (UTC) Received: (qmail 22998 invoked by uid 500); 20 Aug 2012 15:47:49 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 22957 invoked by uid 500); 20 Aug 2012 15:47:49 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 22946 invoked by uid 99); 20 Aug 2012 15:47:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Aug 2012 15:47:49 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tisdall@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Aug 2012 15:47:43 +0000 Received: by lbon10 with SMTP id n10so3221107lbo.11 for ; Mon, 20 Aug 2012 08:47:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=rAGqcUOTobf1FozLzU2jX3aKkhqCCYq1OpjO2hadoiI=; b=tZ/+I0uk893DBtEmPmputjed9VlDNxpm/NkFpygw0Azj5ZfEai1mfVT7ktgBQTV72k 1HYow77GPoNjlGx+65u5FhK5t0+uUTJTH3x0KlAPXfAg8Z/Amrlbud8F7/TFzC/NQ5Ty ZCodiNi9QyB+1QVFgmF5jfYIgZy+gUnp4zz1UZ5jYgtflhfN16JkC66E2KRwYQ20yXxp nMHQdf3pVfIDQBA2P7XJ0IbMwW5x22OHhmU3ifE9GnkYzAAoqHckCf8OB6g65LnBGJw6 77No9z/hFoRiv9yNx4pWIxSmUFMEddPHUpsMpeXEmonjDbbdN9IKZjlhxCU4a4UEoCtw HiCA== MIME-Version: 1.0 Received: by 10.152.112.234 with SMTP id it10mr14272787lab.36.1345477641882; Mon, 20 Aug 2012 08:47:21 -0700 (PDT) Received: by 10.112.49.170 with HTTP; Mon, 20 Aug 2012 08:47:21 -0700 (PDT) In-Reply-To: <9A3968B4-1F42-4A49-98DF-7D9F533870D8@apache.org> References: <9A3968B4-1F42-4A49-98DF-7D9F533870D8@apache.org> Date: Mon, 20 Aug 2012 11:47:21 -0400 Message-ID: Subject: Re: couchdb returning empty response From: Tim Tisdall To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org No, heart remains running. I suppose that's why couchdb nearly immediately comes back up after being killed. Looking at the logs again I am seeing oom-killer killing couchdb, so I guess there's a few things killing it due to lack of memory. On Sun, Aug 19, 2012 at 5:17 PM, Robert Newson wrote: > > Is one of those "other processes" called "heart", by any chance? > > B. > > On 19 Aug 2012, at 21:00, Tim Tisdall wrote: > >> stderr shows this when I hit an empty response: >> >> heart_beat_kill_pid =3D 17700 >> heart_beat_timeout =3D 11 >> Killed >> heart: Sun Aug 19 18:23:54 2012: Erlang has closed. >> heart: Sun Aug 19 18:23:55 2012: Executed "/usr/local/bin/couchdb -k". >> Terminating. >> heart_beat_kill_pid =3D 18390 >> heart_beat_timeout =3D 11 >> Killed >> heart: Sun Aug 19 18:35:18 2012: Erlang has closed. >> heart: Sun Aug 19 18:35:18 2012: Executed "/usr/local/bin/couchdb -k". >> Terminating. >> heart_beat_kill_pid =3D 18438 >> heart_beat_timeout =3D 11 >> >> >> So, it looks like the OS is killing the process because it's running >> out of memory. I can see in syslog that the oom-killer is killing >> processes at exactly the same time. What's strange, though, is >> there's no mention of oom-killer killing couchdb. There's only >> mentions of other processes being killed. >> >> >> On Sun, Aug 19, 2012 at 8:15 AM, Robert Newson wrot= e: >>> 3.9Mb isn't large enough to trigger memory issues on its own on a node = with 380M of ram. Can you use 'top' or 'atop' to see what memory consumptio= n was like before the crash? Erlang/OTP does usually report out of memory e= rrors when it crashes (to stderr which doesn't hit the .log file, iirc). >>> >>> B. >>> >>> >>> On 19 Aug 2012, at 11:30, CGS wrote: >>> >>>> On Sat, Aug 18, 2012 at 9:15 PM, Tim Tisdall wrote= : >>>> >>>>> So, it's possible that couchdb is running out of memory when >>>>> processing a large JSON file? >>>> >>>> >>>> Definitely. >>>> >>>> >>>>> From my last example I gave, the JSON >>>>> file is 3.9Mb which I didn't think was too big, but I do only have >>>>> ~380Mb of RAM. However, I am able to do several thousand similar >>>>> _bulk_doc updates of around the same size before I see the error... >>>>> are memory leaks possible with erlang? >>>> >>>> >>>> It looks more like a RAM limitation per process. There may be a memory >>>> leak, but I am not sure. >>>> >>>> >>>>> Also, why is there nothing in >>>>> the logs about running out of memory? (shouldn't that be something >>>>> the program is able to detect?) >>>>> >>>> >>>> It seems CouchDB doesn't catch this type of warnings. >>>> >>>> >>>>> >>>>> I switched over to using _bulk_doc's because the database grew way to= o >>>>> fast if I did only 1 update at a time. I'm doing about 5000 - 200000 >>>>> document updates each time I run my script so I've been doing the >>>>> updates in batches of 150. >>>>> >>>> >>>> I don't know about your requirements, but I remember a project in whic= h I >>>> created a round-robin to buffer and feed the docs to CouchDB. In that >>>> project I had to find an optimization in between the number of slices = and >>>> the number of docs I could store for being able to feed to CouchDB in = order >>>> to minimize the insertion time. Maybe this idea will help you in your >>>> project as well. >>>> >>>> CGS >>>> >>>> >>>> >>>>> >>>>> -Tim >>>>> >>>>> On Fri, Aug 17, 2012 at 9:33 PM, CGS wrote: >>>>>> I managed to reproduce the error: >>>>>> >>>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] OAuth Params: [] >>>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [debug] [<0.114.0>] Include Doc: >>>>>> <<"_design/_replicator">> {1, >>>>>> >>>>> <<91,250,44,153, >>>>>> >>>>> 238,254,43,46, >>>>>> >>>>>> 180,150,45,181, >>>>>> >>>>>> 10,163,207,212>>} >>>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [info] [<0.32.0>] Apache CouchDB has >>>>>> started on http://0.0.0.0:5984/ >>>>>> >>>>>> ...and I think I identified also the problem: too long/large JSON. >>>>>> >>>>>> Here is how to reproduce the error: >>>>>> >>>>>> 1. CouchDB error level: debug >>>>>> 2. an extra-huge JSON file: echo -n "{\"docs\":[{\"key\":\"1\"}" > >>>>>> my_json.json && for var in $(seq 2 2000000) ; do echo -n >>>>>> ",{\"key\":\"${var}\"}" >> my_json.json ; done && echo -n "]}" >> >>>>>> my_json.json >>>>>> 3. attempting to send it with curl (requires to have database "test" >>>>>> already existing and preferably empty): >>>>>> >>>>>> curl -X POST http://127.0.0.7:5984/test/_bulk_docs -H 'Content-Type: >>>>>> application/json' -d @my_json.json > /dev/null >>>>>> % Total % Received % Xferd Average Speed Time Time Time >>>>>> Current >>>>>> Dload Upload Total Spent Left >>>>>> Speed >>>>>> 100 33.2M 0 0 100 33.2M 0 856k 0:00:39 0:00:39 --:-= -:-- >>>>>> 0 >>>>>> curl: (52) Empty reply from server >>>>>> >>>>>> Erlang shell report for the same problem: >>>>>> >>>>>> =3DINFO REPORT=3D=3D=3D=3D 18-Aug-2012::03:12:57 =3D=3D=3D >>>>>> alarm_handler: {set,{system_memory_high_watermark,[]}} >>>>>> >>>>>> =3DINFO REPORT=3D=3D=3D=3D 18-Aug-2012::03:12:57 =3D=3D=3D >>>>>> alarm_handler: {set,{process_memory_high_watermark,<0.149.0>}} >>>>>> /usr/local/lib/erlang/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has >>>>>> closed.Erlang has closed >>>>>> >>>>>> Tim, try to split your JSON in smaller pieces. Bulk operations tend = to >>>>> use >>>>>> a lot of memory. >>>>>> >>>>>> The _design/_replicator error comes with multipart file set by cURL = by >>>>>> default in such cases. Once a second piece is sent toward the server= , the >>>>>> crash is registered. The first piece report looks like: >>>>>> >>>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] 'POST' >>>>> /test/_bulk_docs >>>>>> {1,1} from "127.0.0.1" >>>>>> >>>>>> I hope this info may help. >>>>>> >>>>>> CGS >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 17, 2012 at 7:30 PM, Tim Tisdall wro= te: >>>>>> >>>>>>> Okay, so it always states that _replicator line any time I manually >>>>>>> restart the server. I think it's just a standard logging message w= hen >>>>>>> the level is set to "debug". >>>>>>> >>>>>>> On Fri, Aug 17, 2012 at 1:13 PM, Tim Tisdall wr= ote: >>>>>>>> No. All my ids (except for design documents) are strings containi= ng >>>>>>>> integers. Also, none of my design documents are called anything l= ike >>>>>>>> "_replicator". The only thing with that name is in the _replicato= r >>>>>>>> database which I'm not doing anything with. >>>>>>>> >>>>>>>> Why does it say "Include Doc"? And what's that series of numbers >>>>>>>> afterwards? That log message seems to consistently occur just bef= ore >>>>>>>> the log message about the server starting. Is that just a normal >>>>>>>> message you get when the server restarts and you have logging set = to >>>>>>>> "debug"? >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Aug 17, 2012 at 1:03 PM, Robert Newson >>>>>>> wrote: >>>>>>>>> >>>>>>>>> Does app_stats_test contain a document called _design/_replicator= or >>>>> is >>>>>>> a document with that id in the body of your bulk post? >>>>>>>>> >>>>>>>>> B. >>>>>>>>> >>>>>>>>> On 17 Aug 2012, at 17:52, Tim Tisdall wrote: >>>>>>>>> >>>>>>>>>> I do have UTF8 characters in the JSON, but isn't that acceptable= ? I >>>>>>>>>> have no problem retrieving UTF8 encoded content from the server = and >>>>> I >>>>>>>>>> have a bunch of it saved in there already too. >>>>>>>>>> >>>>>>>>>> On Fri, Aug 17, 2012 at 10:35 AM, CGS wro= te: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Do you have somehow special characters (non-latin1 ones) in you= r >>>>>>> JSON? That >>>>>>>>>>> error looks strangely close to trying to transform a list of >>>>> unicode >>>>>>>>>>> characters into a binary. I might be wrong though. >>>>>>>>>>> >>>>>>>>>>> CGS >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 17, 2012 at 4:09 PM, Tim Tisdall >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I thought I added that to the init script before when you >>>>> mentioned >>>>>>>>>>>> it, but I checked and it was gone. I added a "cd ~couchdb" in >>>>> there >>>>>>>>>>>> and now I no longer get eaccess errors, but the process still >>>>> crashes >>>>>>>>>>>> with very little information: >>>>>>>>>>>> >>>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] 'POST' >>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1" >>>>>>>>>>>> Headers: [{'Accept',"*/*"}, >>>>>>>>>>>> {'Content-Length',"3902444"}, >>>>>>>>>>>> {'Content-Type',"application/json"}, >>>>>>>>>>>> {'Host',"localhost:5984"}] >>>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] OAuth >>>>> Params: [] >>>>>>>>>>>> [Fri, 17 Aug 2012 14:02:16 GMT] [debug] [<0.115.0>] Include Do= c: >>>>>>>>>>>> <<"_design/_replicator">> {1, >>>>>>>>>>>> >>>>>>>>>>>> <<91,250,44,153, >>>>>>>>>>>> >>>>>>>>>>>> 238,254,43,46, >>>>>>>>>>>> >>>>>>>>>>>> 180,150,45,181, >>>>>>>>>>>> >>>>>>>>>>>> 10,163,207,212>>} >>>>>>>>>>>> [Fri, 17 Aug 2012 14:02:17 GMT] [info] [<0.32.0>] Apache Couch= DB >>>>> has >>>>>>>>>>>> started on http://127.0.0.1:5984/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Someone mentioned seeing the JSON that I'm submitting... Woul= dn't >>>>>>>>>>>> mal-formed JSON throw an error? >>>>>>>>>>>> >>>>>>>>>>>> -Tim >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:33 AM, Robert Newson < >>>>> rnewson@apache.org> >>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I've seen couchdb start despite the eacces errors before and >>>>>>> tracked it >>>>>>>>>>>> down to the current working directory setting. It seems that t= he >>>>> cwd >>>>>>> is >>>>>>>>>>>> searched first, and then erlang looks elsewhere. So, if our >>>>> startup >>>>>>> script >>>>>>>>>>>> doesn't change it to somewhere that the couchdb user can read,= you >>>>>>> get >>>>>>>>>>>> spurious eacces errors. >>>>>>>>>>>>> >>>>>>>>>>>>> Don't ask me how I know this. >>>>>>>>>>>>> >>>>>>>>>>>>> B. >>>>>>>>>>>>> >>>>>>>>>>>>> On 16 Aug 2012, at 20:19, Tim Tisdall wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Paul, did you ever solve the eaccess problem you had describ= ed >>>>>>> here: >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201106.mbox/%3C= 4E0B304F.5080109@lymegreen.co.uk%3E >>>>>>>>>>>>>> I found that post from doing Google searches for my issue. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 11:41 PM, Paul Davis >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 9:38 PM, Tim Tisdall < >>>>> tisdall@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> I'm still having problems with couchdb, but I'm trying out >>>>>>> different >>>>>>>>>>>>>>>> things to see if I can narrow down what the problem is... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I stopped using fsockopen() in PHP and am using curl now t= o >>>>>>> hopefully >>>>>>>>>>>>>>>> be able to see more debugging info. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I get an empty response when sending a POST to _bulk_docs. >>>>> From >>>>>>> the >>>>>>>>>>>>>>>> couch logs it seems like the server restarts in the middle= of >>>>>>>>>>>>>>>> processing the request. Here's what I have in my logs: (= I >>>>> have >>>>>>> no >>>>>>>>>>>>>>>> idea what the _replicator portion is about there, I'm >>>>> currently >>>>>>> not >>>>>>>>>>>>>>>> using it) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] 'POST= ' >>>>>>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1" >>>>>>>>>>>>>>>> Headers: [{'Accept',"*/*"}, >>>>>>>>>>>>>>>> {'Content-Length',"2802300"}, >>>>>>>>>>>>>>>> {'Content-Type',"application/json"}, >>>>>>>>>>>>>>>> {'Host',"localhost:5984"}] >>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] OAuth >>>>>>> Params: [] >>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [debug] [<0.115.0>] Includ= e >>>>> Doc: >>>>>>>>>>>>>>>> <<"_design/_replicator">> {1, >>>>>>>>>>>>>>>> >>>>>>>>>>>> <<91,250,44,153, >>>>>>>>>>>>>>>> >>>>>>>>>>>> 238,254,43,46, >>>>>>>>>>>>>>>> >>>>>>>>>>>> 180,150,45,181, >>>>>>>>>>>>>>>> >>>>>>>>>>>> 10,163,207,212>>} >>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [info] [<0.32.0>] Apache >>>>> CouchDB >>>>>>> has >>>>>>>>>>>>>>>> started on http://127.0.0.1:5984/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In my code logs I have the following by running curl in >>>>> verbose >>>>>>> mode: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * About to connect() to localhost port 5984 (#0) >>>>>>>>>>>>>>>> * Trying 127.0.0.1... * connected >>>>>>>>>>>>>>>> * Connected to localhost (127.0.0.1) port 5984 (#0) >>>>>>>>>>>>>>>>> POST /app_stats_test/_bulk_docs HTTP/1.0 >>>>>>>>>>>>>>>> Host: localhost:5984 >>>>>>>>>>>>>>>> Accept: */* >>>>>>>>>>>>>>>> Content-Type: application/json >>>>>>>>>>>>>>>> Content-Length: 2802300 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * Empty reply from server >>>>>>>>>>>>>>>> * Connection #0 to host localhost left intact >>>>>>>>>>>>>>>> curl error: 52 : Empty reply from server >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I also tried using HTTP/1.1 and I get an empty response af= ter >>>>>>>>>>>>>>>> receiving only a "100 Continue", but the end result appear= s >>>>> the >>>>>>> same. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Tim >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If you have a request that triggers this, a good way to cat= ch >>>>> it >>>>>>> is >>>>>>>>>>>> like such: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> $ /usr/local/bin/couchdb # or however you start it >>>>>>>>>>>>>>> $ ps ax | grep beam.smp # Get the pid of couchdb >>>>>>>>>>>>>>> $ gdb >>>>>>>>>>>>>>> (gdb) attach $pid # Where $pid was just found with ps. >>>>> Might >>>>>>>>>>>>>>> throw up an access prompt >>>>>>>>>>>>>>> (gdb) continue >>>>>>>>>>>>>>> # At this point, run the command that makes couchdb rebo= ot >>>>>>> in a >>>>>>>>>>>>>>> # different console. If it happens you should see Gdb >>>>> notice >>>>>>> the >>>>>>>>>>>>>>> # error. Then the following: >>>>>>>>>>>>>>> (gdb) t a a bt >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> And that should spew out a bunch of stack traces. If you ca= n >>>>> get >>>>>>> that >>>>>>>>>>>>>>> we should be able to fairly specifically narrow down the is= sue. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >