couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glenn Rempe <>
Subject Re: Timeout Error when trying to access views + Indexing problems
Date Tue, 06 Oct 2009 06:46:12 GMT
Hi Adam (and group!).  So I am still struggling, and need help or advice.
 It is much appreciated.
Adam, I tried your suggestion to attempt to open the index file in
interactive mode.  I was able to start CouchDB in interactive mode, and the
'couch_file:open' command return ok instantly, but the
'couch_file:read_header(Fd).' provided no output for about 15 min and I
eventually aborted it.  Should it ever take that long?  What output should I
expect to see?

Please note that to try and make sure there was nothing wrong with the index
files, I actually destroyed them and tried a rebuild from scratch.  This
seemed to be going ok, and the indexing process ran all night.  I got up to
roughly 20% of the 22 million records in the DB indexed when it mysteriously
died yet again.  I am noticing that it is at this point that I am always
seeing this process fail.

When the indexing fails it provides NO information in the couchdb log (with
log level set to 'debug').  I backtracked in the couchdb log file to find
where it stopped working, looking for a stack trace.  All I found was the
point at which it just stopped reporting progress and the couchjs process
was dead.  Here is that point in the log file:

And this is the last line reporting indexing status (line 25 in the pastie):

[Mon, 05 Oct 2009 18:52:42 GMT] [debug] [<0.80.0>] New task status for
searchlight_production _design/SearchDocument: Processed 4377984 of 22650375
changes (19%)

After that, just silence in the log.

Here is one hint which suggests that perhaps it is not just the couchjs
process that is crashing, but perhaps the entire couchdb process?  In the
/var/log/syslog I see the following three minutes after the last entry
showing index status in the couch log:

This is being logged by monit, which I am using to keep an eye on couchdb.
 I may turn off monit, and see if the entire couchdb process crashes when
the indexing stops.

Even with monit turned off, it looks like the couchdb pids have changed
during or after the indexing process failure.  This is the 'ps -ef' output
after the crash, to compare with the output in the previous link which was
taken when the indexing was started:

If I try and access that view in a rails console session after this failure
I am getting a 500 error:

>> counts = SearchDocument.by_engine(:reduce => true, :group => true, :stale
=> 'ok')['rows']
RestClient::RequestFailed: HTTP status code 500

And I see the following stack trace in the couch log:

[Mon, 05 Oct 2009 21:53:00 GMT] [info] [<0.301.0>] Stacktrace:

Here is the full output:

Is it possible that either:

- there is some kind of corruption in the main DB file and when this point
is reached (or a specific record in the DB?) that it crashes? If so how can
I best identify this?
- There is some file system related issue?  I tend to discount this option
just because it always seems to fail at roughly the same point in the

What are the recommended next steps for finding the root cause of the issue?
 Should I insert a log() statement in the view code and try to run the whole
index with that in place?

UPDATE : I tried to run the index again with monit turned off, since I saw
some messages in the syslog about the couchdb process.  I was hoping that
perhaps monit was inadvertantly killing and restarting couchdb which was
causing the index failure.  Alas this was not the case.  It just died on me
again after 3.6 million records.  And there is some different error messages
in the log immediately after it stopped reporting on indexing progress.
 Please take a look at this pastie to see the logs:

(There's a blog of error messages staring at line 25)

I am going to try tonight splitting the 7 views I currently have in a single
design doc in to 4 design docs according to the part of the app they pertain
to.  I am hoping that this will narrow it down if some design docs finish
indexing and others don't.

Thanks again for the help (and reading this long message!).


On Sun, Oct 4, 2009 at 10:19 AM, Adam Kocoloski <> wrote:

> Hi Glenn, I saw something like this once, but unfortunately I wasn't able
> to resolve it.  Can you try the following?
> 1) Start couchdb with the -i option to get an interactive shell
> 2) At the prompt, try to open the view file and read the header
> {ok, Fd} =
> couch_file:open("/vol/couchdb/var/lib/couchdb/.searchlight_production_design/5f190ffb2ed589746e8796d2423869ac.view").
> couch_file:read_header(Fd).
> In my case, that call timed out, and the stacktrace during normal operation
> was exactly the one you showed earlier in this thread.  Best,
> Adam
> On Oct 4, 2009, at 2:07 AM, Glenn Rempe wrote:
>  Thanks for the reply Paul.  Some comments below.
>> Also, just for full disclosure, the CouchDB I am working on was moved out
>> of
>> another couchdb and it was originally created using CDB 0.9.1.  I show a
>> dir
>> listing below that indicates exactly what was moved.
>> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
>>> wrote:
>>  Glenn,
>>> This sounds like your map function is timing out which causes the error.
>>> You could try upping the os process timeout setting in the config.
>>>  When I go into futon and select one of my views in my design doc it
>> *always*
>> consistently pops up a javascript alert with the error text at ~5 seconds
>> after selecting the view.  It doesn't seem to matter what else I do.  It
>> also didn't vary when I changed the os_process_timeout value in futon as
>> you
>> suggested from 5000 to 25000.  Can you explain exactly what this
>> particular
>> param is doing?  I assume the value is milliseconds?
>>  To see what's going on you can increase to debug logging or use the log
>>> function in your maps. There's also the status page in futon which I
>>> think
>>> you said you were looking at.
>>>  Yes, I was previously looking at the status page.  But now since I've
>> upgraded to trunk I never see any indexing activity happening in the
>> status
>> page no matter what I do.
>>  If indexing crashes it should just pick up where it left off when you
>>> retrigger. Use the status page to verify. If it's not then let us know.
>>>  Can you clarify, is this also the case when no index has ever
>> successfully
>> run?  I was wondering if I first need to get through at least one index
>> session (maybe with a smaller amount of records) prior to incremental
>> indexing working as expected.
>> Is there any way to determine what percentage of the total records have
>> been
>> added to the index?
>> For your info, here are the contents of the DB dir.  You can see the main
>> DB
>> is 42GB now (~17 million records).
>> root@ip-10-250-55-239
>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb
>> total 41674956
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
>> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
>> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
>> searchlight_production.couch
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
>> .searchlight_production_design
>> root@ip-10-250-55-239
>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
>> total 33700196
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
>> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
>> 5f190ffb2ed589746e8796d2423869ac.view
>> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
>> b127a58306fb8e7858cd1a92f8398511.view
>> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
>> SearchDocument.view
>>  If you can't find anything in the debug logs then ping the lust and we'll
>>> get into trying to duplicate.
>>>  I have turned on the 'debug' level in the logs and that provided me with
>> the
>> info I previously provided.  I'll try to use the log function in the map
>> and
>> see if that shows anything.
>> Thanks for helping.  If it comes to it, I may be able to make a snapshot
>> of
>> this EBS volume and start a host that you could login to and get your
>> hands
>> directly on it if that would be helpful.
>> Glenn

Glenn Rempe

email                 :
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        :
pgp                    :

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message