couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Tisdall <tisd...@gmail.com>
Subject Re: couchdb returning empty response
Date Sun, 19 Aug 2012 20:00:04 GMT
stderr shows this when I hit an empty response:

heart_beat_kill_pid = 17700
heart_beat_timeout = 11
Killed
heart: Sun Aug 19 18:23:54 2012: Erlang has closed.
heart: Sun Aug 19 18:23:55 2012: Executed "/usr/local/bin/couchdb -k".
Terminating.
heart_beat_kill_pid = 18390
heart_beat_timeout = 11
Killed
heart: Sun Aug 19 18:35:18 2012: Erlang has closed.
heart: Sun Aug 19 18:35:18 2012: Executed "/usr/local/bin/couchdb -k".
Terminating.
heart_beat_kill_pid = 18438
heart_beat_timeout = 11


So, it looks like the OS is killing the process because it's running
out of memory.  I can see in syslog that the oom-killer is killing
processes at exactly the same time.  What's strange, though, is
there's no mention of oom-killer killing couchdb.  There's only
mentions of other processes being killed.


On Sun, Aug 19, 2012 at 8:15 AM, Robert Newson <rnewson@apache.org> wrote:
> 3.9Mb isn't large enough to trigger memory issues on its own on a node with 380M of ram.
Can you use 'top' or 'atop' to see what memory consumption was like before the crash? Erlang/OTP
does usually report out of memory errors when it crashes (to stderr which doesn't hit the
.log file, iirc).
>
> B.
>
>
> On 19 Aug 2012, at 11:30, CGS wrote:
>
>> On Sat, Aug 18, 2012 at 9:15 PM, Tim Tisdall <tisdall@gmail.com> wrote:
>>
>>> So, it's possible that couchdb is running out of memory when
>>> processing a large JSON file?
>>
>>
>> Definitely.
>>
>>
>>> From my last example I gave, the JSON
>>> file is 3.9Mb which I didn't think was too big, but I do only have
>>> ~380Mb of RAM.  However, I am able to do several thousand similar
>>> _bulk_doc updates of around the same size before I see the error...
>>> are memory leaks possible with erlang?
>>
>>
>> It looks more like a RAM limitation per process. There may be a memory
>> leak, but I am not sure.
>>
>>
>>> Also, why is there nothing in
>>> the logs about running out of memory?  (shouldn't that be something
>>> the program is able to detect?)
>>>
>>
>> It seems CouchDB doesn't catch this type of warnings.
>>
>>
>>>
>>> I switched over to using _bulk_doc's because the database grew way too
>>> fast if I did only 1 update at a time.  I'm doing about 5000 - 200000
>>> document updates each time I run my script so I've been doing the
>>> updates in batches of 150.
>>>
>>
>> I don't know about your requirements, but I remember a project in which I
>> created a round-robin to buffer and feed the docs to CouchDB. In that
>> project I had to find an optimization in between the number of slices and
>> the number of docs I could store for being able to feed to CouchDB in order
>> to minimize the insertion time. Maybe this idea will help you in your
>> project as well.
>>
>> CGS
>>
>>
>>
>>>
>>> -Tim
>>>
>>> On Fri, Aug 17, 2012 at 9:33 PM, CGS <cgsmcmlxxv@gmail.com> wrote:
>>>> I managed to reproduce the error:
>>>>
>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] OAuth Params: []
>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [debug] [<0.114.0>] Include Doc:
>>>> <<"_design/_replicator">> {1,
>>>>
>>> <<91,250,44,153,
>>>>
>>> 238,254,43,46,
>>>>
>>>> 180,150,45,181,
>>>>
>>>> 10,163,207,212>>}
>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [info] [<0.32.0>] Apache CouchDB has
>>>> started on http://0.0.0.0:5984/
>>>>
>>>> ...and I think I identified also the problem: too long/large JSON.
>>>>
>>>> Here is how to reproduce the error:
>>>>
>>>> 1. CouchDB error level: debug
>>>> 2. an extra-huge JSON file: echo -n "{\"docs\":[{\"key\":\"1\"}" >
>>>> my_json.json && for var in $(seq 2 2000000) ; do echo -n
>>>> ",{\"key\":\"${var}\"}" >> my_json.json ; done && echo -n "]}"
>>
>>>> my_json.json
>>>> 3. attempting to send it with curl (requires to have database "test"
>>>> already existing and preferably empty):
>>>>
>>>> curl -X POST http://127.0.0.7:5984/test/_bulk_docs -H 'Content-Type:
>>>> application/json' -d @my_json.json > /dev/null
>>>>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>>> Current
>>>>                                 Dload  Upload   Total   Spent    Left
>>>> Speed
>>>> 100 33.2M    0     0  100 33.2M      0   856k  0:00:39  0:00:39 --:--:--
>>>>  0
>>>> curl: (52) Empty reply from server
>>>>
>>>> Erlang shell report for the same problem:
>>>>
>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 ===
>>>>    alarm_handler: {set,{system_memory_high_watermark,[]}}
>>>>
>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 ===
>>>>    alarm_handler: {set,{process_memory_high_watermark,<0.149.0>}}
>>>> /usr/local/lib/erlang/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has
>>>> closed.Erlang has closed
>>>>
>>>> Tim, try to split your JSON in smaller pieces. Bulk operations tend to
>>> use
>>>> a lot of memory.
>>>>
>>>> The _design/_replicator error comes with multipart file set by cURL by
>>>> default in such cases. Once a second piece is sent toward the server, the
>>>> crash is registered. The first piece report looks like:
>>>>
>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] 'POST'
>>> /test/_bulk_docs
>>>> {1,1} from "127.0.0.1"
>>>>
>>>> I hope this info may help.
>>>>
>>>> CGS
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 17, 2012 at 7:30 PM, Tim Tisdall <tisdall@gmail.com> wrote:
>>>>
>>>>> Okay, so it always states that _replicator line any time I manually
>>>>> restart the server.  I think it's just a standard logging message when
>>>>> the level is set to "debug".
>>>>>
>>>>> On Fri, Aug 17, 2012 at 1:13 PM, Tim Tisdall <tisdall@gmail.com>
wrote:
>>>>>> No.  All my ids (except for design documents) are strings containing
>>>>>> integers.  Also, none of my design documents are called anything
like
>>>>>> "_replicator".  The only thing with that name is in the _replicator
>>>>>> database which I'm not doing anything with.
>>>>>>
>>>>>> Why does it say "Include Doc"?  And what's that series of numbers
>>>>>> afterwards?  That log message seems to consistently occur just before
>>>>>> the log message about the server starting.  Is that just a normal
>>>>>> message you get when the server restarts and you have logging set
to
>>>>>> "debug"?
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 17, 2012 at 1:03 PM, Robert Newson <rnewson@apache.org>
>>>>> wrote:
>>>>>>>
>>>>>>> Does app_stats_test contain a document called _design/_replicator
or
>>> is
>>>>> a document with that id in the body of your bulk post?
>>>>>>>
>>>>>>> B.
>>>>>>>
>>>>>>> On 17 Aug 2012, at 17:52, Tim Tisdall wrote:
>>>>>>>
>>>>>>>> I do have UTF8 characters in the JSON, but isn't that acceptable?
 I
>>>>>>>> have no problem retrieving UTF8 encoded content from the
server and
>>> I
>>>>>>>> have a bunch of it saved in there already too.
>>>>>>>>
>>>>>>>> On Fri, Aug 17, 2012 at 10:35 AM, CGS <cgsmcmlxxv@gmail.com>
wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Do you have somehow special characters (non-latin1 ones)
in your
>>>>> JSON? That
>>>>>>>>> error looks strangely close to trying to transform a
list of
>>> unicode
>>>>>>>>> characters into a binary. I might be wrong though.
>>>>>>>>>
>>>>>>>>> CGS
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 17, 2012 at 4:09 PM, Tim Tisdall <tisdall@gmail.com>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I thought I added that to the init script before
when you
>>> mentioned
>>>>>>>>>> it, but I checked and it was gone.  I added a "cd
~couchdb" in
>>> there
>>>>>>>>>> and now I no longer get eaccess errors, but the process
still
>>> crashes
>>>>>>>>>> with very little information:
>>>>>>>>>>
>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>]
'POST'
>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1"
>>>>>>>>>> Headers: [{'Accept',"*/*"},
>>>>>>>>>>         {'Content-Length',"3902444"},
>>>>>>>>>>         {'Content-Type',"application/json"},
>>>>>>>>>>         {'Host',"localhost:5984"}]
>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>]
OAuth
>>> Params: []
>>>>>>>>>> [Fri, 17 Aug 2012 14:02:16 GMT] [debug] [<0.115.0>]
Include Doc:
>>>>>>>>>> <<"_design/_replicator">> {1,
>>>>>>>>>>
>>>>>>>>>> <<91,250,44,153,
>>>>>>>>>>
>>>>>>>>>> 238,254,43,46,
>>>>>>>>>>
>>>>>>>>>> 180,150,45,181,
>>>>>>>>>>
>>>>>>>>>> 10,163,207,212>>}
>>>>>>>>>> [Fri, 17 Aug 2012 14:02:17 GMT] [info] [<0.32.0>]
Apache CouchDB
>>> has
>>>>>>>>>> started on http://127.0.0.1:5984/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Someone mentioned seeing the JSON that I'm submitting...
 Wouldn't
>>>>>>>>>> mal-formed JSON throw an error?
>>>>>>>>>>
>>>>>>>>>> -Tim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 17, 2012 at 4:33 AM, Robert Newson <
>>> rnewson@apache.org>
>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I've seen couchdb start despite the eacces errors
before and
>>>>> tracked it
>>>>>>>>>> down to the current working directory setting. It
seems that the
>>> cwd
>>>>> is
>>>>>>>>>> searched first, and then erlang looks elsewhere.
So, if our
>>> startup
>>>>> script
>>>>>>>>>> doesn't change it to somewhere that the couchdb user
can read, you
>>>>> get
>>>>>>>>>> spurious eacces errors.
>>>>>>>>>>>
>>>>>>>>>>> Don't ask me how I know this.
>>>>>>>>>>>
>>>>>>>>>>> B.
>>>>>>>>>>>
>>>>>>>>>>> On 16 Aug 2012, at 20:19, Tim Tisdall wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Paul, did you ever solve the eaccess problem
you had described
>>>>> here:
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201106.mbox/%3C4E0B304F.5080109@lymegreen.co.uk%3E
>>>>>>>>>>>> I found that post from doing Google searches
for my issue.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 14, 2012 at 11:41 PM, Paul Davis
>>>>>>>>>>>> <paul.joseph.davis@gmail.com> wrote:
>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 9:38 PM, Tim
Tisdall <
>>> tisdall@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> I'm still having problems with couchdb,
but I'm trying out
>>>>> different
>>>>>>>>>>>>>> things to see if I can narrow down
what the problem is...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I stopped using fsockopen() in PHP
and am using curl now to
>>>>> hopefully
>>>>>>>>>>>>>> be able to see more debugging info.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I get an empty response when sending
a POST to _bulk_docs.
>>> From
>>>>> the
>>>>>>>>>>>>>> couch logs it seems like the server
restarts in the middle of
>>>>>>>>>>>>>> processing the request.  Here's what
I have in my logs:  (I
>>> have
>>>>> no
>>>>>>>>>>>>>> idea what the _replicator portion
is about there, I'm
>>> currently
>>>>> not
>>>>>>>>>>>>>> using it)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug]
[<0.1255.0>] 'POST'
>>>>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0}
from "127.0.0.1"
>>>>>>>>>>>>>> Headers: [{'Accept',"*/*"},
>>>>>>>>>>>>>>        {'Content-Length',"2802300"},
>>>>>>>>>>>>>>        {'Content-Type',"application/json"},
>>>>>>>>>>>>>>        {'Host',"localhost:5984"}]
>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT] [debug]
[<0.1255.0>] OAuth
>>>>> Params: []
>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [debug]
[<0.115.0>] Include
>>> Doc:
>>>>>>>>>>>>>> <<"_design/_replicator">>
{1,
>>>>>>>>>>>>>>
>>>>>>>>>> <<91,250,44,153,
>>>>>>>>>>>>>>
>>>>>>>>>> 238,254,43,46,
>>>>>>>>>>>>>>
>>>>>>>>>> 180,150,45,181,
>>>>>>>>>>>>>>
>>>>>>>>>> 10,163,207,212>>}
>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT] [info]
[<0.32.0>] Apache
>>> CouchDB
>>>>> has
>>>>>>>>>>>>>> started on http://127.0.0.1:5984/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In my code logs I have the following
by running curl in
>>> verbose
>>>>> mode:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * About to connect() to localhost
port 5984 (#0)
>>>>>>>>>>>>>> *   Trying 127.0.0.1... * connected
>>>>>>>>>>>>>> * Connected to localhost (127.0.0.1)
port 5984 (#0)
>>>>>>>>>>>>>>> POST /app_stats_test/_bulk_docs
HTTP/1.0
>>>>>>>>>>>>>> Host: localhost:5984
>>>>>>>>>>>>>> Accept: */*
>>>>>>>>>>>>>> Content-Type: application/json
>>>>>>>>>>>>>> Content-Length: 2802300
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * Empty reply from server
>>>>>>>>>>>>>> * Connection #0 to host localhost
left intact
>>>>>>>>>>>>>> curl error: 52 : Empty reply from
server
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also tried using HTTP/1.1 and I
get an empty response after
>>>>>>>>>>>>>> receiving only a "100 Continue",
but the end result appears
>>> the
>>>>> same.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Tim
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you have a request that triggers this,
a good way to catch
>>> it
>>>>> is
>>>>>>>>>> like such:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  $ /usr/local/bin/couchdb # or however
you start it
>>>>>>>>>>>>>  $ ps ax | grep beam.smp # Get the pid
of couchdb
>>>>>>>>>>>>>  $ gdb
>>>>>>>>>>>>>     (gdb) attach $pid # Where $pid was
just found with ps.
>>> Might
>>>>>>>>>>>>> throw up an access prompt
>>>>>>>>>>>>>     (gdb) continue
>>>>>>>>>>>>>     # At this point, run the command
that makes couchdb reboot
>>>>> in a
>>>>>>>>>>>>>     # different console. If it happens
you should see Gdb
>>> notice
>>>>> the
>>>>>>>>>>>>>     # error. Then the following:
>>>>>>>>>>>>>     (gdb) t a a bt
>>>>>>>>>>>>>
>>>>>>>>>>>>> And that should spew out a bunch of stack
traces. If you can
>>> get
>>>>> that
>>>>>>>>>>>>> we should be able to fairly specifically
narrow down the issue.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Mime
View raw message