couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Complex view generation stuck, never gets past silent crash (Raspberry Pi)
Date Wed, 05 Sep 2012 09:59:13 GMT

Did you need to disable reduce_overflow_limit or is your maxAllowed clipping algorithm keeping
you on the safe side?

B.


On 5 Sep 2012, at 09:56, Nathan Vander Wilt wrote:

> I'm still trying to figure this issue out — what's causing it, if/how I might workaround.
I added well over 2GB of swap to supplement the RasPi's default 256MB real + 100MB swap, and
the issue has not been mitigated at all.
> 
> After I query my view (changed now to add logs at start/end of its map and its reduce
function, so it's starting from scratch again) by hitting http://192.168.1.43:5984/loctest/_design/loclog/_view/by_time
I get of course a bunch of logs, ending with:
> 
> $ sudo tail /var/log/couchdb/couch.log
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 2 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 2 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 1 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 1 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 2 values
> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
> [Wed, 05 Sep 2012 04:59:01 GMT] [info] [<0.32.0>] Apache CouchDB has started on
http://0.0.0.0:5984/
> 
> 
> The output of `top -u couchdb -d 1 -b > top_log.txt` tends to look something like
this around the time it "runs out of memory" — doesn't seem to really be terribly low, certainly
not in swap but even a decent chunk of real mem available:
> 
> top - 21:14:27 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
> Tasks:  59 total,   2 running,  57 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  1.5 us,  5.0 sy,  0.0 ni,  0.0 id, 90.0 wa,  0.0 hi,  3.5 si,  0.0 st
> KiB Mem:    236880 total,   170896 used,    65984 free,      504 buffers
> KiB Swap:  2461680 total,    36704 used,  2424976 free,     6028 cached
> 
>  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
> 2982 couchdb   20   0  174m 123m  688 R   0.2 53.3   0:53.04 couchjs
> 3003 couchdb   20   0 23188 1596  724 D   0.2  0.7   2:15.39 couchjs
> 2566 couchdb   20   0  1716  312  240 S   0.1  0.1   0:00.05 couchdb
> 2937 couchdb   20   0  1632  416  356 S   0.1  0.2   0:00.10 heart
> 3027 couchdb   20   0  1716  512  448 S   0.1  0.2   0:00.01 sh
> 
> top - 21:14:28 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
> Tasks:  56 total,   1 running,  55 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 42.9 us, 29.5 sy,  0.0 ni, 21.0 id,  5.7 wa,  0.0 hi,  1.0 si,  0.0 st
> KiB Mem:    236880 total,   170896 used,    65984 free,      504 buffers
> KiB Swap:  2461680 total,    36704 used,  2424976 free,     6028 cached
> 
>  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
> 2566 couchdb   20   0  1716  312  240 S   0.0  0.1   0:00.05 couchdb
> 3033 couchdb   20   0  3148  504  440 S   0.0  0.2   0:00.00 sleep
> 
> 
> top - 21:14:30 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
> Tasks:  56 total,   1 running,  55 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  1.9 us,  1.0 sy,  0.0 ni, 97.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> KiB Mem:    236880 total,    21552 used,   215328 free,      816 buffers
> KiB Swap:  2461680 total,     9484 used,  2452196 free,     9932 cached
> 
> 
> This is with Erlang R15B01 (1:15.b.1-dfsg-3), mozjs 1.8.5-1.0.0+dfsg-3.1 — both armhf.
> 
> `ulimit -a` says:
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1849
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 1849
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> 
> 
> And AFAICT, perhaps Erlang 32-bit builds may have an artificial heap limit of 2GB but
no longer the relatively low 2^28 I see way back in the release logs. Any ideas why I'm hitting
the memory limit or if its possible to tune CouchDB to bite off smaller pieces at a time?
It'd be nice if I could get this design doc working along with the other more simpler ones.
> 
> thanks,
> -natevw
> 
> 
> 
> On Sep 1, 2012, at 8:54 AM, R.J. Steinert wrote:
> 
>> Natevw,
>> I'm also using CouchDB on Pi.  I'm new to CouchDB so excuse me if this is a
>> dumb answer... Have you tried sprinkling your views with log() functions to
>> figure out where the view is getting stuck?
>> 
>> --
>> RJ Steinert
>> http://rjsteinert.com
>> 
>> 
>> 
>> On Sat, Sep 1, 2012 at 1:55 AM, Nathan Vander Wilt <nate-lists@calftrail.com
>>> wrote:
>> 
>>> I've got CouchDB mostly working on my Raspberry Pi, simply via `apt-get
>>> couchdb` plus the permissions fix Jens posted about recently.
>>> 
>>> However, I can't get a particularly complex design document to finish its
>>> initial view generation. (See
>>> https://github.com/natevw/LocLog/tree/master/views especially
>>> https://github.com/natevw/LocLog/blob/master/views/by_utc/reduce.js for
>>> source code.) Originally I was getting explicit timeout errors, so after
>>> unsuccessfully trying more conservative values I cranked os_process_timeout
>>> to 9000000. This got it a lot farther, but now it seems stuck with no
>>> indication of what's going wrong except the server suddenly drops out
>>> before getting respawned:
>>> 
>>> [Sat, 01 Sep 2012 04:55:55 GMT] [info] [<0.15090.1>] checkpointing view
>>> update at seq 2272 for loctest _design/loclog
>>> [Sat, 01 Sep 2012 05:00:01 GMT] [info] [<0.15090.1>] checkpointing view
>>> update at seq 2409 for loctest _design/loclog
>>> [Sat, 01 Sep 2012 05:09:49 GMT] [info] [<0.15090.1>] checkpointing view
>>> update at seq 2517 for loctest _design/loclog
>>> [Sat, 01 Sep 2012 05:14:46 GMT] [info] [<0.32.0>] Apache CouchDB has
>>> started on http://0.0.0.0:5984/
>>> 
>>> [Sat, 01 Sep 2012 05:19:50 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>>> /_active_tasks 200
>>> [Sat, 01 Sep 2012 05:19:55 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>>> /_active_tasks 200
>>> [Sat, 01 Sep 2012 05:20:00 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>>> /_active_tasks 200
>>> [Sat, 01 Sep 2012 05:20:05 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>>> /_active_tasks 200
>>> [Sat, 01 Sep 2012 05:20:10 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>>> /_active_tasks 200
>>> [Sat, 01 Sep 2012 05:20:15 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET
>>> /_active_tasks 200
>>> [Sat, 01 Sep 2012 05:20:55 GMT] [info] [<0.32.0>] Apache CouchDB has
>>> started on http://0.0.0.0:5984/
>>> 
>>> 
>>> Any idea how to determine what could cause this, and/or if there's a
>>> remedy? My reduce function is rather float-heavy and I suspect perhaps the
>>> package build is using soft floats instead of hardware (not sure how to
>>> verify), but regardless the view made it this far and to see it simply fail
>>> without so much as a trace is a new one to me. I don't particularly suspect
>>> an out-of-memory condition — the whole database is <100MB (albeit snappy
>>> compressed) and this is spread across well over 5000 separate documents.
>>> 
>>> thanks,
>>> -natevw
> 


Mime
View raw message