couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: couchdb returning empty response
Date Sun, 19 Aug 2012 21:17:16 GMT

Is one of those "other processes" called "heart", by any chance?

B.

On 19 Aug 2012, at 21:00, Tim Tisdall wrote:

> stderr shows this when I hit an empty response:
> 
> heart_beat_kill_pid = 17700
> heart_beat_timeout = 11
> Killed
> heart: Sun Aug 19 18:23:54 2012: Erlang has closed.
> heart: Sun Aug 19 18:23:55 2012: Executed "/usr/local/bin/couchdb -k".
> Terminating.
> heart_beat_kill_pid = 18390
> heart_beat_timeout = 11
> Killed
> heart: Sun Aug 19 18:35:18 2012: Erlang has closed.
> heart: Sun Aug 19 18:35:18 2012: Executed "/usr/local/bin/couchdb -k".
> Terminating.
> heart_beat_kill_pid = 18438
> heart_beat_timeout = 11
> 
> 
> So, it looks like the OS is killing the process because it's running
> out of memory.  I can see in syslog that the oom-killer is killing
> processes at exactly the same time.  What's strange, though, is
> there's no mention of oom-killer killing couchdb.  There's only
> mentions of other processes being killed.
> 
> 
> On Sun, Aug 19, 2012 at 8:15 AM, Robert Newson <rnewson@apache.org> wrote:
>> 3.9Mb isn't large enough to trigger memory issues on its own on a node with 380M
of ram. Can you use 'top' or 'atop' to see what memory consumption was like before the crash?
Erlang/OTP does usually report out of memory errors when it crashes (to stderr which doesn't
hit the .log file, iirc).
>> 
>> B.
>> 
>> 
>> On 19 Aug 2012, at 11:30, CGS wrote:
>> 
>>> On Sat, Aug 18, 2012 at 9:15 PM, Tim Tisdall <tisdall@gmail.com> wrote:
>>> 
>>>> So, it's possible that couchdb is running out of memory when
>>>> processing a large JSON file?
>>> 
>>> 
>>> Definitely.
>>> 
>>> 
>>>> From my last example I gave, the JSON
>>>> file is 3.9Mb which I didn't think was too big, but I do only have
>>>> ~380Mb of RAM.  However, I am able to do several thousand similar
>>>> _bulk_doc updates of around the same size before I see the error...
>>>> are memory leaks possible with erlang?
>>> 
>>> 
>>> It looks more like a RAM limitation per process. There may be a memory
>>> leak, but I am not sure.
>>> 
>>> 
>>>> Also, why is there nothing in
>>>> the logs about running out of memory?  (shouldn't that be something
>>>> the program is able to detect?)
>>>> 
>>> 
>>> It seems CouchDB doesn't catch this type of warnings.
>>> 
>>> 
>>>> 
>>>> I switched over to using _bulk_doc's because the database grew way too
>>>> fast if I did only 1 update at a time.  I'm doing about 5000 - 200000
>>>> document updates each time I run my script so I've been doing the
>>>> updates in batches of 150.
>>>> 
>>> 
>>> I don't know about your requirements, but I remember a project in which I
>>> created a round-robin to buffer and feed the docs to CouchDB. In that
>>> project I had to find an optimization in between the number of slices and
>>> the number of docs I could store for being able to feed to CouchDB in order
>>> to minimize the insertion time. Maybe this idea will help you in your
>>> project as well.
>>> 
>>> CGS
>>> 
>>> 
>>> 
>>>> 
>>>> -Tim
>>>> 
>>>> On Fri, Aug 17, 2012 at 9:33 PM, CGS <cgsmcmlxxv@gmail.com> wrote:
>>>>> I managed to reproduce the error:
>>>>> 
>>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] OAuth Params:
[]
>>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [debug] [<0.114.0>] Include Doc:
>>>>> <<"_design/_replicator">> {1,
>>>>> 
>>>> <<91,250,44,153,
>>>>> 
>>>> 238,254,43,46,
>>>>> 
>>>>> 180,150,45,181,
>>>>> 
>>>>> 10,163,207,212>>}
>>>>> [Sat, 18 Aug 2012 00:58:37 GMT] [info] [<0.32.0>] Apache CouchDB
has
>>>>> started on http://0.0.0.0:5984/
>>>>> 
>>>>> ...and I think I identified also the problem: too long/large JSON.
>>>>> 
>>>>> Here is how to reproduce the error:
>>>>> 
>>>>> 1. CouchDB error level: debug
>>>>> 2. an extra-huge JSON file: echo -n "{\"docs\":[{\"key\":\"1\"}" >
>>>>> my_json.json && for var in $(seq 2 2000000) ; do echo -n
>>>>> ",{\"key\":\"${var}\"}" >> my_json.json ; done && echo
-n "]}" >>
>>>>> my_json.json
>>>>> 3. attempting to send it with curl (requires to have database "test"
>>>>> already existing and preferably empty):
>>>>> 
>>>>> curl -X POST http://127.0.0.7:5984/test/_bulk_docs -H 'Content-Type:
>>>>> application/json' -d @my_json.json > /dev/null
>>>>> % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>>>> Current
>>>>>                                Dload  Upload   Total   Spent    Left
>>>>> Speed
>>>>> 100 33.2M    0     0  100 33.2M      0   856k  0:00:39  0:00:39 --:--:--
>>>>> 0
>>>>> curl: (52) Empty reply from server
>>>>> 
>>>>> Erlang shell report for the same problem:
>>>>> 
>>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 ===
>>>>>   alarm_handler: {set,{system_memory_high_watermark,[]}}
>>>>> 
>>>>> =INFO REPORT==== 18-Aug-2012::03:12:57 ===
>>>>>   alarm_handler: {set,{process_memory_high_watermark,<0.149.0>}}
>>>>> /usr/local/lib/erlang/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has
>>>>> closed.Erlang has closed
>>>>> 
>>>>> Tim, try to split your JSON in smaller pieces. Bulk operations tend to
>>>> use
>>>>> a lot of memory.
>>>>> 
>>>>> The _design/_replicator error comes with multipart file set by cURL by
>>>>> default in such cases. Once a second piece is sent toward the server,
the
>>>>> crash is registered. The first piece report looks like:
>>>>> 
>>>>> [Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] 'POST'
>>>> /test/_bulk_docs
>>>>> {1,1} from "127.0.0.1"
>>>>> 
>>>>> I hope this info may help.
>>>>> 
>>>>> CGS
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Aug 17, 2012 at 7:30 PM, Tim Tisdall <tisdall@gmail.com>
wrote:
>>>>> 
>>>>>> Okay, so it always states that _replicator line any time I manually
>>>>>> restart the server.  I think it's just a standard logging message
when
>>>>>> the level is set to "debug".
>>>>>> 
>>>>>> On Fri, Aug 17, 2012 at 1:13 PM, Tim Tisdall <tisdall@gmail.com>
wrote:
>>>>>>> No.  All my ids (except for design documents) are strings containing
>>>>>>> integers.  Also, none of my design documents are called anything
like
>>>>>>> "_replicator".  The only thing with that name is in the _replicator
>>>>>>> database which I'm not doing anything with.
>>>>>>> 
>>>>>>> Why does it say "Include Doc"?  And what's that series of numbers
>>>>>>> afterwards?  That log message seems to consistently occur just
before
>>>>>>> the log message about the server starting.  Is that just a normal
>>>>>>> message you get when the server restarts and you have logging
set to
>>>>>>> "debug"?
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 17, 2012 at 1:03 PM, Robert Newson <rnewson@apache.org>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Does app_stats_test contain a document called _design/_replicator
or
>>>> is
>>>>>> a document with that id in the body of your bulk post?
>>>>>>>> 
>>>>>>>> B.
>>>>>>>> 
>>>>>>>> On 17 Aug 2012, at 17:52, Tim Tisdall wrote:
>>>>>>>> 
>>>>>>>>> I do have UTF8 characters in the JSON, but isn't that
acceptable?  I
>>>>>>>>> have no problem retrieving UTF8 encoded content from
the server and
>>>> I
>>>>>>>>> have a bunch of it saved in there already too.
>>>>>>>>> 
>>>>>>>>> On Fri, Aug 17, 2012 at 10:35 AM, CGS <cgsmcmlxxv@gmail.com>
wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Do you have somehow special characters (non-latin1
ones) in your
>>>>>> JSON? That
>>>>>>>>>> error looks strangely close to trying to transform
a list of
>>>> unicode
>>>>>>>>>> characters into a binary. I might be wrong though.
>>>>>>>>>> 
>>>>>>>>>> CGS
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Aug 17, 2012 at 4:09 PM, Tim Tisdall <tisdall@gmail.com>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I thought I added that to the init script before
when you
>>>> mentioned
>>>>>>>>>>> it, but I checked and it was gone.  I added a
"cd ~couchdb" in
>>>> there
>>>>>>>>>>> and now I no longer get eaccess errors, but the
process still
>>>> crashes
>>>>>>>>>>> with very little information:
>>>>>>>>>>> 
>>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>]
'POST'
>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0} from "127.0.0.1"
>>>>>>>>>>> Headers: [{'Accept',"*/*"},
>>>>>>>>>>>        {'Content-Length',"3902444"},
>>>>>>>>>>>        {'Content-Type',"application/json"},
>>>>>>>>>>>        {'Host',"localhost:5984"}]
>>>>>>>>>>> [Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>]
OAuth
>>>> Params: []
>>>>>>>>>>> [Fri, 17 Aug 2012 14:02:16 GMT] [debug] [<0.115.0>]
Include Doc:
>>>>>>>>>>> <<"_design/_replicator">> {1,
>>>>>>>>>>> 
>>>>>>>>>>> <<91,250,44,153,
>>>>>>>>>>> 
>>>>>>>>>>> 238,254,43,46,
>>>>>>>>>>> 
>>>>>>>>>>> 180,150,45,181,
>>>>>>>>>>> 
>>>>>>>>>>> 10,163,207,212>>}
>>>>>>>>>>> [Fri, 17 Aug 2012 14:02:17 GMT] [info] [<0.32.0>]
Apache CouchDB
>>>> has
>>>>>>>>>>> started on http://127.0.0.1:5984/
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Someone mentioned seeing the JSON that I'm submitting...
 Wouldn't
>>>>>>>>>>> mal-formed JSON throw an error?
>>>>>>>>>>> 
>>>>>>>>>>> -Tim
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:33 AM, Robert Newson
<
>>>> rnewson@apache.org>
>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I've seen couchdb start despite the eacces
errors before and
>>>>>> tracked it
>>>>>>>>>>> down to the current working directory setting.
It seems that the
>>>> cwd
>>>>>> is
>>>>>>>>>>> searched first, and then erlang looks elsewhere.
So, if our
>>>> startup
>>>>>> script
>>>>>>>>>>> doesn't change it to somewhere that the couchdb
user can read, you
>>>>>> get
>>>>>>>>>>> spurious eacces errors.
>>>>>>>>>>>> 
>>>>>>>>>>>> Don't ask me how I know this.
>>>>>>>>>>>> 
>>>>>>>>>>>> B.
>>>>>>>>>>>> 
>>>>>>>>>>>> On 16 Aug 2012, at 20:19, Tim Tisdall wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Paul, did you ever solve the eaccess
problem you had described
>>>>>> here:
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>> 
>>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201106.mbox/%3C4E0B304F.5080109@lymegreen.co.uk%3E
>>>>>>>>>>>>> I found that post from doing Google searches
for my issue.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 11:41 PM, Paul
Davis
>>>>>>>>>>>>> <paul.joseph.davis@gmail.com> wrote:
>>>>>>>>>>>>>> On Tue, Aug 14, 2012 at 9:38 PM,
Tim Tisdall <
>>>> tisdall@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> I'm still having problems with
couchdb, but I'm trying out
>>>>>> different
>>>>>>>>>>>>>>> things to see if I can narrow
down what the problem is...
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I stopped using fsockopen() in
PHP and am using curl now to
>>>>>> hopefully
>>>>>>>>>>>>>>> be able to see more debugging
info.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I get an empty response when
sending a POST to _bulk_docs.
>>>> From
>>>>>> the
>>>>>>>>>>>>>>> couch logs it seems like the
server restarts in the middle of
>>>>>>>>>>>>>>> processing the request.  Here's
what I have in my logs:  (I
>>>> have
>>>>>> no
>>>>>>>>>>>>>>> idea what the _replicator portion
is about there, I'm
>>>> currently
>>>>>> not
>>>>>>>>>>>>>>> using it)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT]
[debug] [<0.1255.0>] 'POST'
>>>>>>>>>>>>>>> /app_stats_test/_bulk_docs {1,0}
from "127.0.0.1"
>>>>>>>>>>>>>>> Headers: [{'Accept',"*/*"},
>>>>>>>>>>>>>>>       {'Content-Length',"2802300"},
>>>>>>>>>>>>>>>       {'Content-Type',"application/json"},
>>>>>>>>>>>>>>>       {'Host',"localhost:5984"}]
>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:30 GMT]
[debug] [<0.1255.0>] OAuth
>>>>>> Params: []
>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT]
[debug] [<0.115.0>] Include
>>>> Doc:
>>>>>>>>>>>>>>> <<"_design/_replicator">>
{1,
>>>>>>>>>>>>>>> 
>>>>>>>>>>> <<91,250,44,153,
>>>>>>>>>>>>>>> 
>>>>>>>>>>> 238,254,43,46,
>>>>>>>>>>>>>>> 
>>>>>>>>>>> 180,150,45,181,
>>>>>>>>>>>>>>> 
>>>>>>>>>>> 10,163,207,212>>}
>>>>>>>>>>>>>>> [Wed, 15 Aug 2012 02:27:45 GMT]
[info] [<0.32.0>] Apache
>>>> CouchDB
>>>>>> has
>>>>>>>>>>>>>>> started on http://127.0.0.1:5984/
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In my code logs I have the following
by running curl in
>>>> verbose
>>>>>> mode:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> * About to connect() to localhost
port 5984 (#0)
>>>>>>>>>>>>>>> *   Trying 127.0.0.1... * connected
>>>>>>>>>>>>>>> * Connected to localhost (127.0.0.1)
port 5984 (#0)
>>>>>>>>>>>>>>>> POST /app_stats_test/_bulk_docs
HTTP/1.0
>>>>>>>>>>>>>>> Host: localhost:5984
>>>>>>>>>>>>>>> Accept: */*
>>>>>>>>>>>>>>> Content-Type: application/json
>>>>>>>>>>>>>>> Content-Length: 2802300
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> * Empty reply from server
>>>>>>>>>>>>>>> * Connection #0 to host localhost
left intact
>>>>>>>>>>>>>>> curl error: 52 : Empty reply
from server
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I also tried using HTTP/1.1 and
I get an empty response after
>>>>>>>>>>>>>>> receiving only a "100 Continue",
but the end result appears
>>>> the
>>>>>> same.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Tim
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If you have a request that triggers
this, a good way to catch
>>>> it
>>>>>> is
>>>>>>>>>>> like such:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> $ /usr/local/bin/couchdb # or however
you start it
>>>>>>>>>>>>>> $ ps ax | grep beam.smp # Get the
pid of couchdb
>>>>>>>>>>>>>> $ gdb
>>>>>>>>>>>>>>    (gdb) attach $pid # Where $pid
was just found with ps.
>>>> Might
>>>>>>>>>>>>>> throw up an access prompt
>>>>>>>>>>>>>>    (gdb) continue
>>>>>>>>>>>>>>    # At this point, run the command
that makes couchdb reboot
>>>>>> in a
>>>>>>>>>>>>>>    # different console. If it happens
you should see Gdb
>>>> notice
>>>>>> the
>>>>>>>>>>>>>>    # error. Then the following:
>>>>>>>>>>>>>>    (gdb) t a a bt
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> And that should spew out a bunch
of stack traces. If you can
>>>> get
>>>>>> that
>>>>>>>>>>>>>> we should be able to fairly specifically
narrow down the issue.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> 


Mime
View raw message