couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Vander Wilt <nate-li...@calftrail.com>
Subject Re: Complex view generation stuck, never gets past silent crash (Raspberry Pi)
Date Wed, 05 Sep 2012 08:56:51 GMT
I'm still trying to figure this issue out — what's causing it, if/how I might workaround.
I added well over 2GB of swap to supplement the RasPi's default 256MB real + 100MB swap, and
the issue has not been mitigated at all.

After I query my view (changed now to add logs at start/end of its map and its reduce function,
so it's starting from scratch again) by hitting http://192.168.1.43:5984/loctest/_design/loclog/_view/by_time
I get of course a bunch of logs, ending with:

$ sudo tail /var/log/couchdb/couch.log
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Finishing reduce with 2 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Starting reduce with 2 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Finishing reduce with 2 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Starting reduce with 2 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Finishing reduce with 2 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Starting reduce with 1 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Finishing reduce with 1 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Starting reduce with 2 values
[Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999> Log
:: Finishing reduce with 2 values
[Wed, 05 Sep 2012 04:59:01 GMT] [info] [<0.32.0>] Apache CouchDB has started on http://0.0.0.0:5984/


The output of `top -u couchdb -d 1 -b > top_log.txt` tends to look something like this
around the time it "runs out of memory" — doesn't seem to really be terribly low, certainly
not in swap but even a decent chunk of real mem available:

top - 21:14:27 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
Tasks:  59 total,   2 running,  57 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.5 us,  5.0 sy,  0.0 ni,  0.0 id, 90.0 wa,  0.0 hi,  3.5 si,  0.0 st
KiB Mem:    236880 total,   170896 used,    65984 free,      504 buffers
KiB Swap:  2461680 total,    36704 used,  2424976 free,     6028 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 2982 couchdb   20   0  174m 123m  688 R   0.2 53.3   0:53.04 couchjs
 3003 couchdb   20   0 23188 1596  724 D   0.2  0.7   2:15.39 couchjs
 2566 couchdb   20   0  1716  312  240 S   0.1  0.1   0:00.05 couchdb
 2937 couchdb   20   0  1632  416  356 S   0.1  0.2   0:00.10 heart
 3027 couchdb   20   0  1716  512  448 S   0.1  0.2   0:00.01 sh

top - 21:14:28 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
Tasks:  56 total,   1 running,  55 sleeping,   0 stopped,   0 zombie
%Cpu(s): 42.9 us, 29.5 sy,  0.0 ni, 21.0 id,  5.7 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem:    236880 total,   170896 used,    65984 free,      504 buffers
KiB Swap:  2461680 total,    36704 used,  2424976 free,     6028 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 2566 couchdb   20   0  1716  312  240 S   0.0  0.1   0:00.05 couchdb
 3033 couchdb   20   0  3148  504  440 S   0.0  0.2   0:00.00 sleep


top - 21:14:30 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
Tasks:  56 total,   1 running,  55 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.9 us,  1.0 sy,  0.0 ni, 97.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    236880 total,    21552 used,   215328 free,      816 buffers
KiB Swap:  2461680 total,     9484 used,  2452196 free,     9932 cached


This is with Erlang R15B01 (1:15.b.1-dfsg-3), mozjs 1.8.5-1.0.0+dfsg-3.1 — both armhf.

`ulimit -a` says:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1849
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1849
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


And AFAICT, perhaps Erlang 32-bit builds may have an artificial heap limit of 2GB but no longer
the relatively low 2^28 I see way back in the release logs. Any ideas why I'm hitting the
memory limit or if its possible to tune CouchDB to bite off smaller pieces at a time? It'd
be nice if I could get this design doc working along with the other more simpler ones.

thanks,
-natevw



On Sep 1, 2012, at 8:54 AM, R.J. Steinert wrote:

> Natevw,
> I'm also using CouchDB on Pi.  I'm new to CouchDB so excuse me if this is a
> dumb answer... Have you tried sprinkling your views with log() functions to
> figure out where the view is getting stuck?
> 
> --
> RJ Steinert
> http://rjsteinert.com
> 
> 
> 
> On Sat, Sep 1, 2012 at 1:55 AM, Nathan Vander Wilt <nate-lists@calftrail.com
>> wrote:
> 
>> I've got CouchDB mostly working on my Raspberry Pi, simply via `apt-get
>> couchdb` plus the permissions fix Jens posted about recently.
>> 
>> However, I can't get a particularly complex design document to finish its
>> initial view generation. (See
>> https://github.com/natevw/LocLog/tree/master/views especially
>> https://github.com/natevw/LocLog/blob/master/views/by_utc/reduce.js for
>> source code.) Originally I was getting explicit timeout errors, so after
>> unsuccessfully trying more conservative values I cranked os_process_timeout
>> to 9000000. This got it a lot farther, but now it seems stuck with no
>> indication of what's going wrong except the server suddenly drops out
>> before getting respawned:
>> 
>> [Sat, 01 Sep 2012 04:55:55 GMT] [info] [<0.15090.1>] checkpointing view
>> update at seq 2272 for loctest _design/loclog
>> [Sat, 01 Sep 2012 05:00:01 GMT] [info] [<0.15090.1>] checkpointing view
>> update at seq 2409 for loctest _design/loclog
>> [Sat, 01 Sep 2012 05:09:49 GMT] [info] [<0.15090.1>] checkpointing view
>> update at seq 2517 for loctest _design/loclog
>> [Sat, 01 Sep 2012 05:14:46 GMT] [info] [<0.32.0>] Apache CouchDB has
>> started on http://0.0.0.0:5984/
>> 
>> [Sat, 01 Sep 2012 05:19:50 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>> /_active_tasks 200
>> [Sat, 01 Sep 2012 05:19:55 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>> /_active_tasks 200
>> [Sat, 01 Sep 2012 05:20:00 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>> /_active_tasks 200
>> [Sat, 01 Sep 2012 05:20:05 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>> /_active_tasks 200
>> [Sat, 01 Sep 2012 05:20:10 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>> /_active_tasks 200
>> [Sat, 01 Sep 2012 05:20:15 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>> /_active_tasks 200
>> [Sat, 01 Sep 2012 05:20:55 GMT] [info] [<0.32.0>] Apache CouchDB has
>> started on http://0.0.0.0:5984/
>> 
>> 
>> Any idea how to determine what could cause this, and/or if there's a
>> remedy? My reduce function is rather float-heavy and I suspect perhaps the
>> package build is using soft floats instead of hardware (not sure how to
>> verify), but regardless the view made it this far and to see it simply fail
>> without so much as a trace is a new one to me. I don't particularly suspect
>> an out-of-memory condition — the whole database is <100MB (albeit snappy
>> compressed) and this is spread across well over 5000 separate documents.
>> 
>> thanks,
>> -natevw


Mime
View raw message