couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Vander Wilt <nate-li...@calftrail.com>
Subject Re: Complex view generation stuck, never gets past silent crash (Raspberry Pi)
Date Wed, 05 Sep 2012 17:07:59 GMT
On Sep 5, 2012, at 2:59 AM, Robert Newson wrote:

> 
> Did you need to disable reduce_overflow_limit or is your maxAllowed clipping algorithm
keeping you on the safe side?

No, and yes. I'm not getting the reduce_overflow_limit error, it's an Erlang VM crash e.g.
"Cannot allocate 62690240 bytes of memory (of type "heap")." [sample erl_crash.dump at https://gist.github.com/3617465]

The maxAllowed has worked for me; I've successfully generated this view a number of times
on other machines. For last night's test run I actually hardcoded `maxAllowed = 1` and it
still crashed, leaving the view partially stale with the exact same number (2990) of updates
left to process as before. Somehow it seems that either: a) Erlang has its own memory limit
and/or somehow refuses to use swap, or b) CouchDB is suddenly trying to bite off well more
than 2GB of data (from a 100MB database?!) at a time.

I've generated this view on an older MacBook Air with only 2GB of RAM (with usually barely
any free because Safari/etc. is a hog) and while it was slow it finished every time without
issue.

thanks,
-natevw



> On 5 Sep 2012, at 09:56, Nathan Vander Wilt wrote:
> 
>> I'm still trying to figure this issue out — what's causing it, if/how I might workaround.
I added well over 2GB of swap to supplement the RasPi's default 256MB real + 100MB swap, and
the issue has not been mitigated at all.
>> 
>> After I query my view (changed now to add logs at start/end of its map and its reduce
function, so it's starting from scratch again) by hitting http://192.168.1.43:5984/loctest/_design/loclog/_view/by_time
I get of course a bunch of logs, ending with:
>> 
>> $ sudo tail /var/log/couchdb/couch.log
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 2 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 2 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 1 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 1 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Starting reduce with 2 values
>> [Wed, 05 Sep 2012 04:58:06 GMT] [info] [<0.444.0>] OS Process #Port<0.1999>
Log :: Finishing reduce with 2 values
>> [Wed, 05 Sep 2012 04:59:01 GMT] [info] [<0.32.0>] Apache CouchDB has started
on http://0.0.0.0:5984/
>> 
>> 
>> The output of `top -u couchdb -d 1 -b > top_log.txt` tends to look something like
this around the time it "runs out of memory" — doesn't seem to really be terribly low, certainly
not in swap but even a decent chunk of real mem available:
>> 
>> top - 21:14:27 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
>> Tasks:  59 total,   2 running,  57 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  1.5 us,  5.0 sy,  0.0 ni,  0.0 id, 90.0 wa,  0.0 hi,  3.5 si,  0.0 st
>> KiB Mem:    236880 total,   170896 used,    65984 free,      504 buffers
>> KiB Swap:  2461680 total,    36704 used,  2424976 free,     6028 cached
>> 
>> PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>> 2982 couchdb   20   0  174m 123m  688 R   0.2 53.3   0:53.04 couchjs
>> 3003 couchdb   20   0 23188 1596  724 D   0.2  0.7   2:15.39 couchjs
>> 2566 couchdb   20   0  1716  312  240 S   0.1  0.1   0:00.05 couchdb
>> 2937 couchdb   20   0  1632  416  356 S   0.1  0.2   0:00.10 heart
>> 3027 couchdb   20   0  1716  512  448 S   0.1  0.2   0:00.01 sh
>> 
>> top - 21:14:28 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
>> Tasks:  56 total,   1 running,  55 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 42.9 us, 29.5 sy,  0.0 ni, 21.0 id,  5.7 wa,  0.0 hi,  1.0 si,  0.0 st
>> KiB Mem:    236880 total,   170896 used,    65984 free,      504 buffers
>> KiB Swap:  2461680 total,    36704 used,  2424976 free,     6028 cached
>> 
>> PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>> 2566 couchdb   20   0  1716  312  240 S   0.0  0.1   0:00.05 couchdb
>> 3033 couchdb   20   0  3148  504  440 S   0.0  0.2   0:00.00 sleep
>> 
>> 
>> top - 21:14:30 up 22:29,  1 user,  load average: 4.23, 1.96, 0.94
>> Tasks:  56 total,   1 running,  55 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  1.9 us,  1.0 sy,  0.0 ni, 97.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
>> KiB Mem:    236880 total,    21552 used,   215328 free,      816 buffers
>> KiB Swap:  2461680 total,     9484 used,  2452196 free,     9932 cached
>> 
>> 
>> This is with Erlang R15B01 (1:15.b.1-dfsg-3), mozjs 1.8.5-1.0.0+dfsg-3.1 — both
armhf.
>> 
>> `ulimit -a` says:
>> core file size          (blocks, -c) 0
>> data seg size           (kbytes, -d) unlimited
>> scheduling priority             (-e) 0
>> file size               (blocks, -f) unlimited
>> pending signals                 (-i) 1849
>> max locked memory       (kbytes, -l) 64
>> max memory size         (kbytes, -m) unlimited
>> open files                      (-n) 1024
>> pipe size            (512 bytes, -p) 8
>> POSIX message queues     (bytes, -q) 819200
>> real-time priority              (-r) 0
>> stack size              (kbytes, -s) 8192
>> cpu time               (seconds, -t) unlimited
>> max user processes              (-u) 1849
>> virtual memory          (kbytes, -v) unlimited
>> file locks                      (-x) unlimited
>> 
>> 
>> And AFAICT, perhaps Erlang 32-bit builds may have an artificial heap limit of 2GB
but no longer the relatively low 2^28 I see way back in the release logs. Any ideas why I'm
hitting the memory limit or if its possible to tune CouchDB to bite off smaller pieces at
a time? It'd be nice if I could get this design doc working along with the other more simpler
ones.
>> 
>> thanks,
>> -natevw
>> 
>> 
>> 
>> On Sep 1, 2012, at 8:54 AM, R.J. Steinert wrote:
>> 
>>> Natevw,
>>> I'm also using CouchDB on Pi.  I'm new to CouchDB so excuse me if this is a
>>> dumb answer... Have you tried sprinkling your views with log() functions to
>>> figure out where the view is getting stuck?
>>> 
>>> --
>>> RJ Steinert
>>> http://rjsteinert.com
>>> 
>>> 
>>> 
>>> On Sat, Sep 1, 2012 at 1:55 AM, Nathan Vander Wilt <nate-lists@calftrail.com
>>>> wrote:
>>> 
>>>> I've got CouchDB mostly working on my Raspberry Pi, simply via `apt-get
>>>> couchdb` plus the permissions fix Jens posted about recently.
>>>> 
>>>> However, I can't get a particularly complex design document to finish its
>>>> initial view generation. (See
>>>> https://github.com/natevw/LocLog/tree/master/views especially
>>>> https://github.com/natevw/LocLog/blob/master/views/by_utc/reduce.js for
>>>> source code.) Originally I was getting explicit timeout errors, so after
>>>> unsuccessfully trying more conservative values I cranked os_process_timeout
>>>> to 9000000. This got it a lot farther, but now it seems stuck with no
>>>> indication of what's going wrong except the server suddenly drops out
>>>> before getting respawned:
>>>> 
>>>> [Sat, 01 Sep 2012 04:55:55 GMT] [info] [<0.15090.1>] checkpointing
view
>>>> update at seq 2272 for loctest _design/loclog
>>>> [Sat, 01 Sep 2012 05:00:01 GMT] [info] [<0.15090.1>] checkpointing
view
>>>> update at seq 2409 for loctest _design/loclog
>>>> [Sat, 01 Sep 2012 05:09:49 GMT] [info] [<0.15090.1>] checkpointing
view
>>>> update at seq 2517 for loctest _design/loclog
>>>> [Sat, 01 Sep 2012 05:14:46 GMT] [info] [<0.32.0>] Apache CouchDB has
>>>> started on http://0.0.0.0:5984/
>>>> 
>>>> [Sat, 01 Sep 2012 05:19:50 GMT] [info] [<0.121.0>] 192.168.1.6 - -
GET
>>>> /_active_tasks 200
>>>> [Sat, 01 Sep 2012 05:19:55 GMT] [info] [<0.121.0>] 192.168.1.6 - -
GET
>>>> /_active_tasks 200
>>>> [Sat, 01 Sep 2012 05:20:00 GMT] [info] [<0.121.0>] 192.168.1.6 - -
GET
>>>> /_active_tasks 200
>>>> [Sat, 01 Sep 2012 05:20:05 GMT] [info] [<0.121.0>] 192.168.1.6 - -
GET
>>>> /_active_tasks 200
>>>> [Sat, 01 Sep 2012 05:20:10 GMT] [info] [<0.121.0>] 192.168.1.6 - -
GET
>>>> /_active_tasks 200
>>>> [Sat, 01 Sep 2012 05:20:15 GMT] [info] [<0.121.0>] 192.168.1.6 - -
GET
>>>> /_active_tasks 200
>>>> [Sat, 01 Sep 2012 05:20:55 GMT] [info] [<0.32.0>] Apache CouchDB has
>>>> started on http://0.0.0.0:5984/
>>>> 
>>>> 
>>>> Any idea how to determine what could cause this, and/or if there's a
>>>> remedy? My reduce function is rather float-heavy and I suspect perhaps the
>>>> package build is using soft floats instead of hardware (not sure how to
>>>> verify), but regardless the view made it this far and to see it simply fail
>>>> without so much as a trace is a new one to me. I don't particularly suspect
>>>> an out-of-memory condition — the whole database is <100MB (albeit snappy
>>>> compressed) and this is spread across well over 5000 separate documents.
>>>> 
>>>> thanks,
>>>> -natevw
>> 
> 


Mime
View raw message