couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Tisdall <tisd...@gmail.com>
Subject Re: couchdb returning empty response
Date Mon, 20 Aug 2012 15:47:21 GMT
No, heart remains running.  I suppose that's why couchdb nearly
immediately comes back up after being killed.  Looking at the logs
again I am seeing oom-killer killing couchdb, so I guess there's a few
things killing it due to lack of memory.

On Sun, Aug 19, 2012 at 5:17 PM, Robert Newson <rnewson@apache.org> wrote:
>
> Is one of those "other processes" called "heart", by any chance?
>
> B.
>
> On 19 Aug 2012, at 21:00, Tim Tisdall wrote:
>
>> stderr shows this when I hit an empty response:
>>
>> heart_beat_kill_pid = 17700
>> heart_beat_timeout = 11
>> Killed
>> heart: Sun Aug 19 18:23:54 2012: Erlang has closed.
>> heart: Sun Aug 19 18:23:55 2012: Executed "/usr/local/bin/couchdb -k".
>> Terminating.
>> heart_beat_kill_pid = 18390
>> heart_beat_timeout = 11
>> Killed
>> heart: Sun Aug 19 18:35:18 2012: Erlang has closed.
>> heart: Sun Aug 19 18:35:18 2012: Executed "/usr/local/bin/couchdb -k".
>> Terminating.
>> heart_beat_kill_pid = 18438
>> heart_beat_timeout = 11
>>
>>
>> So, it looks like the OS is killing the process because it's running
>> out of memory.  I can see in syslog that the oom-killer is killing
>> processes at exactly the same time.  What's strange, though, is
>> there's no mention of oom-killer killing couchdb.  There's only
>> mentions of other processes being killed.
>>
>>
>> On Sun, Aug 19, 2012 at 8:15 AM, Robert Newson <rnewson@apache.org> wrote:
>>> 3.9Mb isn't large enough to trigger memory issues on its own on a node with 380M
of ram. Can you use 'top' or 'atop' to see what memory consumption was like before the crash?
Erlang/OTP does usually report out of memory errors when it crashes (to stderr which doesn't
hit the .log file, iirc).
>>>
>>> B.
>>>
>>>
>>> On 19 Aug 2012, at 11:30, CGS wrote:
>>>
>>>> On Sat, Aug 18, 2012 at 9:15 PM, Tim Tisdall <tisdall@gmail.com> wrote:
>>>>
>>>>> So, it's possible that couchdb is running out of memory when
>>>>> processing a large JSON file?
>>>>
>>>>
>>>> Definitely.
>>>>
>>>>
>>>>> From my last example I gave, the JSON
>>>>> file is 3.9Mb which I didn't think was too big, but I do only have
>>>>> ~380Mb of RAM.  However, I am able to do several thousand similar
>>>>> _bulk_doc updates of around the same size before I see the error...
>>>>> are memory leaks possible with erlang?
>>>>
>>>>
>>>> It looks more like a RAM limitation per process. There may be a memory
>>>> leak, but I am not sure.
>>>>
>>>>
>>>>> Also, why is there nothing in
>>>>> the logs about running out of memory?  (shouldn't that be something
>>>>> the program is able to detect?)
>>>>>
>>>>
>>>> It seems CouchDB doesn't catch this type of warnings.
>>>>
>>>>
>>>>>
>>>>> I switched over to using _bulk_doc's because the database grew way too
>>>>> fast if I did only 1 update at a time.  I'm doing about 5000 - 200000
>>>>> document updates each time I run my script so I've been doing the
>>>>> updates in batches of 150.
>>>>>
>>>>
>>>> I don't know about your requirements, but I remember a project in which I
>>>> created a round-robin to buffer and feed the docs to CouchDB. In that
>>>> project I had to find an optimization in between the number of slices and
>>>> the number of docs I could store for being able to feed to CouchDB in order
>>>> to minimize the insertion time. Maybe this idea will help you in your
>>>> project as well.
>>>>
>>>> CGS
>>>>
>>>>
>>>>
>>>>>
>>>>> -Tim
>>>>>
>>>>> On Fri, Aug 17, 2012 at 9:33 PM, CGS <cgsmcmlxxv@gmail.com> wrote:
>>>>>> I managed to reproduce the error:
>>>>>>
>>>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] OAuth Params:
[]
>>>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [debug] [<0.114.0>] Include
Doc:
>>>>>> <<"_design/_replicator">> {1,
>>>>>>
>>>>> <<91,250,44,153,
>>>>>>
>>>>> 238,254,43,46,
>>>>>>
>>>>>> 180,150,45,181,
>>>>>>
>>>>>> 10,163,207,212>>}
>>>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [info] [<0.32.0>] Apache CouchDB
has
>>>>>> started on http://0.0.0.0:5984/
>>>>>>
>>>>>> ...and I think I identified also the problem: too long/large JSON.
>>>>>>
>>>>>> Here is how to reproduce the error:
>>>>>>
>>>>>> 1. CouchDB error level: debug
>>>>>> 2. an extra-huge JSON file: echo -n "{\"docs\":[{\"key\":\"1\"}"
>
>>>>>> my_json.json && for var in $(seq 2 2000000) ; do echo -n
>>>>>> ",{\"key\":\"${var}\"}" >> my_json.json ; done && echo
-n "]}" >>
>>>>>> my_json.json
>>>>>> 3. attempting to send it with curl (requires to have database "test"
>>>>>> already existing and preferably empty):
>>>>>>
>>>>>> curl -X POST http://127.0.0.7:5984/test/_bulk_docs -H 'Content-Type:
>>>>>> application/json' -d @my_json.json > /dev/null
>>>>>> % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>>>>> Current
>>>>>>                                Dload  Upload   Total   Spent    Left
>>>>>> Speed
>>>>>> 100 33.2M    0     0  100 33.2M      0   856k  0:00:39  0:00:39 --:--:--
>>>>>> 0
>>>>>> curl: (52) Empty reply from server
>>>>>>
>>>>>> Erlang shell report for the same problem:
>>>>>>
>>>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 ===
>>>>>>   alarm_handler: {set,{system_memory_high_watermark,[]}}
>>>>>>
>>>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 ===
>>>>>>   alarm_handler: {set,{process_memory_high_watermark,<0.149.0>}}
>>>>>> /usr/local/lib/erlang/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has
>>>>>> closed.Erlang has closed
>>>>>>
>>>>>> Tim, try to split your JSON in smaller pieces. Bulk operations tend
to
>>>>> use
>>>>>> a lot of memory.
>>>>>>
>>>>>> The _design/_replicator error comes with multipart file set by cURL
by
>>>>>> default in such cases. Once a second piece is sent toward the server,
the
>>>>>> crash is registered. The first piece report looks like:
>>>>>>
>>>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] 'POST'
>>>>> /test/_bulk_docs
>>>>>> {1,1} from "127.0.0.1"
>>>>>>
>>>>>> I hope this info may help.
>>>>>>
>>>>>> CGS
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 17, 2012 at 7:30 PM, Tim Tisdall <tisdall@gmail.com>
wrote:
>>>>>>
>>>>>>> Okay, so it always states that _replicator line any time I manually
>>>>>>> restart the server.  I think it's just a standard logging message
when
>>>>>>> the level is set to "debug".
>>>>>>>
>>>>>>> On Fri, Aug 17, 2012 at 1:13 PM, Tim Tisdall <tisdall@gmail.com>
wrote:
>>>>>>>> No.  All my ids (except for design documents) are strings
containing
>>>>>>>> integers.  Also, none of my design documents are called anything
like
>>>>>>>> "_replicator".  The only thing with that name is in the _replicator
>>>>>>>> database which I'm not doing anything with.
>>>>>>>>
>>>>>>>> Why does it say "Include Doc"?  And what's that series of
numbers
>>>>>>>> afterwards?  That log message seems to consistently occur
just before
>>>>>>>> the log message about the server starting.  Is that just
a normal
>>>>>>>> message you get when the server restarts and you have logging
set to
>>>>>>>> "debug"?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 17, 2012 at 1:03 PM, Robert Newson <rnewson@apache.org>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Does app_stats_test contain a document called _design/_replicator
or
>>>>> is
>>>>>>> a document with that id in the body of your bulk post?
>>>>>>>>>
>>>>>>>>> B.
>>>>>>>>>
>>>>>>>>> On 17 Aug 2012, at 17:52, Tim Tisdall wrote:
>>>>>>>>>
>>>>>>>>>> I do have UTF8 characters in the JSON, but isn't
that acceptable?  I
>>>>>>>>>> have no problem retrieving UTF8 encoded content from
the server and
>>>>> I
>>>>>>>>>> have a bunch of it saved in there already too.
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 17, 2012 at 10:35 AM, CGS <cgsmcmlxxv@gmail.com>
wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Do you have somehow special characters (non-latin1
ones) in your
>>>>>>> JSON? That
>>>>>>>>>>> error looks strangely close to trying to transform
a list of
>>>>> unicode
>>>>>>>>>>> characters into a binary. I might be wrong though.
>>>>>>>>>>>
>>>>>>>>>>> CGS
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:09 PM, Tim Tisdall
<tisdall@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I thought I added that to the init script
before when you
>>>>> mentioned
>>>>>>>>>>>> it, but I checked and it was gone.  I added
a "cd ~couchdb" in
>>>>> there
>>>>>>>>>>>> and now I no longer get eaccess errors, but
the process still
>>>>> crashes
>>>>>>>>>>>> with very little information:
>>>>>>>>>>>>
>>>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>]
'POST'
>>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1"
>>>>>>>>>>>> Headers: [{'Accept',"*/*"},
>>>>>>>>>>>>        {'Content-Length',"3902444"},
>>>>>>>>>>>>        {'Content-Type',"application/json"},
>>>>>>>>>>>>        {'Host',"localhost:5984"}]
>>>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>]
OAuth
>>>>> Params: []
>>>>>>>>>>>> [Fri, 17 Aug 2012 14:02:16 GMT] [debug] [<0.115.0>]
Include Doc:
>>>>>>>>>>>> <<"_design/_replicator">> {1,
>>>>>>>>>>>>
>>>>>>>>>>>> <<91,250,44,153,
>>>>>>>>>>>>
>>>>>>>>>>>> 238,254,43,46,
>>>>>>>>>>>>
>>>>>>>>>>>> 180,150,45,181,
>>>>>>>>>>>>
>>>>>>>>>>>> 10,163,207,212>>}
>>>>>>>>>>>> [Fri, 17 Aug 2012 14:02:17 GMT] [info] [<0.32.0>]
Apache CouchDB
>>>>> has
>>>>>>>>>>>> started on http://127.0.0.1:5984/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Someone mentioned seeing the JSON that I'm
submitting...  Wouldn't
>>>>>>>>>>>> mal-formed JSON throw an error?
>>>>>>>>>>>>
>>>>>>>>>>>> -Tim
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:33 AM, Robert Newson
<
>>>>> rnewson@apache.org>
>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've seen couchdb start despite the eacces
errors before and
>>>>>>> tracked it
>>>>>>>>>>>> down to the current working directory setting.
It seems that the
>>>>> cwd
>>>>>>> is
>>>>>>>>>>>> searched first, and then erlang looks elsewhere.
So, if our
>>>>> startup
>>>>>>> script
>>>>>>>>>>>> doesn't change it to somewhere that the couchdb
user can read, you
>>>>>>> get
>>>>>>>>>>>> spurious eacces errors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Don't ask me how I know this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> B.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 16 Aug 2012, at 20:19, Tim Tisdall
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Paul, did you ever solve the eaccess
problem you had described
>>>>>>> here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201106.mbox/%3C4E0B304F.5080109@lymegreen.co.uk%3E
>>>>>>>>>>>>>> I found that post from doing Google
searches for my issue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 11:41 PM,
Paul Davis
>>>>>>>>>>>>>> <paul.joseph.davis@gmail.com>
wrote:
>>>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 9:38
PM, Tim Tisdall <
>>>>> tisdall@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> I'm still having problems
with couchdb, but I'm trying out
>>>>>>> different
>>>>>>>>>>>>>>>> things to see if I can narrow
down what the problem is...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I stopped using fsockopen()
in PHP and am using curl now to
>>>>>>> hopefully
>>>>>>>>>>>>>>>> be able to see more debugging
info.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I get an empty response when
sending a POST to _bulk_docs.
>>>>> From
>>>>>>> the
>>>>>>>>>>>>>>>> couch logs it seems like
the server restarts in the middle of
>>>>>>>>>>>>>>>> processing the request. 
Here's what I have in my logs:  (I
>>>>> have
>>>>>>> no
>>>>>>>>>>>>>>>> idea what the _replicator
portion is about there, I'm
>>>>> currently
>>>>>>> not
>>>>>>>>>>>>>>>> using it)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30
GMT] [debug] [<0.1255.0>] 'POST'
>>>>>>>>>>>>>>>> /app_stats_test/_bulk_docs
{1,0} from "127.0.0.1"
>>>>>>>>>>>>>>>> Headers: [{'Accept',"*/*"},
>>>>>>>>>>>>>>>>       {'Content-Length',"2802300"},
>>>>>>>>>>>>>>>>       {'Content-Type',"application/json"},
>>>>>>>>>>>>>>>>       {'Host',"localhost:5984"}]
>>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30
GMT] [debug] [<0.1255.0>] OAuth
>>>>>>> Params: []
>>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45
GMT] [debug] [<0.115.0>] Include
>>>>> Doc:
>>>>>>>>>>>>>>>> <<"_design/_replicator">>
{1,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>> <<91,250,44,153,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>> 238,254,43,46,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>> 180,150,45,181,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>> 10,163,207,212>>}
>>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45
GMT] [info] [<0.32.0>] Apache
>>>>> CouchDB
>>>>>>> has
>>>>>>>>>>>>>>>> started on http://127.0.0.1:5984/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In my code logs I have the
following by running curl in
>>>>> verbose
>>>>>>> mode:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * About to connect() to localhost
port 5984 (#0)
>>>>>>>>>>>>>>>> *   Trying 127.0.0.1... *
connected
>>>>>>>>>>>>>>>> * Connected to localhost
(127.0.0.1) port 5984 (#0)
>>>>>>>>>>>>>>>>> POST /app_stats_test/_bulk_docs
HTTP/1.0
>>>>>>>>>>>>>>>> Host: localhost:5984
>>>>>>>>>>>>>>>> Accept: */*
>>>>>>>>>>>>>>>> Content-Type: application/json
>>>>>>>>>>>>>>>> Content-Length: 2802300
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * Empty reply from server
>>>>>>>>>>>>>>>> * Connection #0 to host localhost
left intact
>>>>>>>>>>>>>>>> curl error: 52 : Empty reply
from server
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also tried using HTTP/1.1
and I get an empty response after
>>>>>>>>>>>>>>>> receiving only a "100 Continue",
but the end result appears
>>>>> the
>>>>>>> same.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Tim
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you have a request that triggers
this, a good way to catch
>>>>> it
>>>>>>> is
>>>>>>>>>>>> like such:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ /usr/local/bin/couchdb # or
however you start it
>>>>>>>>>>>>>>> $ ps ax | grep beam.smp # Get
the pid of couchdb
>>>>>>>>>>>>>>> $ gdb
>>>>>>>>>>>>>>>    (gdb) attach $pid # Where
$pid was just found with ps.
>>>>> Might
>>>>>>>>>>>>>>> throw up an access prompt
>>>>>>>>>>>>>>>    (gdb) continue
>>>>>>>>>>>>>>>    # At this point, run the command
that makes couchdb reboot
>>>>>>> in a
>>>>>>>>>>>>>>>    # different console. If it
happens you should see Gdb
>>>>> notice
>>>>>>> the
>>>>>>>>>>>>>>>    # error. Then the following:
>>>>>>>>>>>>>>>    (gdb) t a a bt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And that should spew out a bunch
of stack traces. If you can
>>>>> get
>>>>>>> that
>>>>>>>>>>>>>>> we should be able to fairly specifically
narrow down the issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Mime
View raw message