incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Vander Wilt <nate-li...@calftrail.com>
Subject Re: Complex view generation stuck, never gets past silent crash (Raspberry Pi)
Date Sat, 01 Sep 2012 21:03:04 GMT
On Sep 1, 2012, at 10:27 AM, Dave Cottlehuber wrote:
> On 1 September 2012 07:55, Nathan Vander Wilt <nate-lists@calftrail.com> wrote:
>> I've got CouchDB mostly working on my Raspberry Pi, simply via `apt-get couchdb`
plus the permissions fix Jens posted about recently.
>> 
>> However, I can't get a particularly complex design document to finish its initial
view generation. (See https://github.com/natevw/LocLog/tree/master/views especially https://github.com/natevw/LocLog/blob/master/views/by_utc/reduce.js
for source code.) Originally I was getting explicit timeout errors, so after unsuccessfully
trying more conservative values I cranked os_process_timeout to 9000000. This got it a lot
farther, but now it seems stuck with no indication of what's going wrong except the server
suddenly drops out before getting respawned:
>> 
>> [Sat, 01 Sep 2012 04:55:55 GMT] [info] [<0.15090.1>] checkpointing view update
at seq 2272 for loctest _design/loclog
>> [Sat, 01 Sep 2012 05:00:01 GMT] [info] [<0.15090.1>] checkpointing view update
at seq 2409 for loctest _design/loclog
>> [Sat, 01 Sep 2012 05:09:49 GMT] [info] [<0.15090.1>] checkpointing view update
at seq 2517 for loctest _design/loclog
>> [Sat, 01 Sep 2012 05:14:46 GMT] [info] [<0.32.0>] Apache CouchDB has started
on http://0.0.0.0:5984/
>> 
>> [Sat, 01 Sep 2012 05:19:50 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET /_active_tasks
200
>> [Sat, 01 Sep 2012 05:19:55 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET /_active_tasks
200
>> [Sat, 01 Sep 2012 05:20:00 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET /_active_tasks
200
>> [Sat, 01 Sep 2012 05:20:05 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET /_active_tasks
200
>> [Sat, 01 Sep 2012 05:20:10 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET /_active_tasks
200
>> [Sat, 01 Sep 2012 05:20:15 GMT] [info] [<0.121.0>] 192.168.1.6 - - GET /_active_tasks
200
>> [Sat, 01 Sep 2012 05:20:55 GMT] [info] [<0.32.0>] Apache CouchDB has started
on http://0.0.0.0:5984/
>> 
>> 
>> Any idea how to determine what could cause this, and/or if there's a remedy? My reduce
function is rather float-heavy and I suspect perhaps the package build is using soft floats
instead of hardware (not sure how to verify), but regardless the view made it this far and
to see it simply fail without so much as a trace is a new one to me. I don't particularly
suspect an out-of-memory condition — the whole database is <100MB (albeit snappy compressed)
and this is spread across well over 5000 separate documents.
>> 
>> thanks,
>> -natevw
> 
> Does it pass the test suite?
> If not, what errors are coming up?
> If it does, you might try running couchjs directly like this:
> 
> /usr/local/bin/couchjs /usr/local/share/couchdb/server/main.js
> 
> & read http://wiki.apache.org/couchdb/View_server?action=show&redirect=ViewServer#Basic_API
> for driving this.
> 
> with a few of your docs & see what happens.


Thanks, yes it does pass most of the test suite (via Futon in Firefox). Only issues are replication-related:

replication 301560ms
1. Assertion failed: copy !== null
2. Exception raised: {}

replicator_db 35653ms
1. Assertion 'typeof repDoc._replication_stats === "object", "doc has stats"' failed: doc
has stats
2. Exception raised: {}


To be clear, it is not processing an individual document that fails, or generating views in
general. I have managed to get a complete "catchup run" of (simpler) views in a different
design document on a different dataset generated. On this particular view, before I increased
the timeout it made it to sequence 78 before stopping with a timeout log after an error. I
set the timeout to 2.5 hours and it got a lot farther. But...

The problem I am having now is that the view can't get past its current checkpoint (over halfway
through the change sequences) and there is no trace of why not — the whole server just disappears
until restarted. So basically I can query the ?stale=ok view and get some of the data, but
if I want the full set from what's in the database my view request waits a while, but then
just drops when the server disappears five or ten minutes into the view update. It seems to
be hitting some sort of very unexpected issue — it's not a simple timeout, as those were
logged. I don't *think* it's a memory issue, as none of the documents are particularly large
and up until then I see most of my Pi's real memory still available as well as all the swap.
(Perhaps the first [not re-]reduce dataset is somehow overlarge, as it's my reduce function
that is the complex part, but I would expect the index to be reasonably balanced…)

Under what conditions during view generation would the entire CouchDB server simply abend
without any leaving indication in its logs?

regards,
-natevw
Mime
View raw message