incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Timeout Error when trying to access views + Indexing problems
Date Tue, 06 Oct 2009 18:01:49 GMT
On Tue, Oct 6, 2009 at 12:28 PM, Glenn Rempe <glenn@rempe.us> wrote:
> I spoke too soon.  The indexing just died after running all night and
> getting up to ~6mm records indexed.  :-(
> It *almost* feels like there's a correlation between my accessing the system
> through futon to check on status, and its failing silently...
>
> Look at this log from this AM.  Its been running all night, and then all of
> a sudden 'CouchDB has restarted' message when I am accessing futon.  Is it
> possible that something related to futon is slowed or died forcing a restart
> of *everything* and killing my indexing process? See:
>
> http://pastie.org/643915
>
> There has *got* to be a way to bump up the logging level on all of these
> processes.  Is that single line in the log about restarting CouchDB really
> all I get?  With no indication at ALL of WHY its restarted (and apparently
> killed my index processing in the process, or is that the indexing process
> has died forcing a restart?  The logs just don't seem to want to give that
> info up.) ?  Am I missing something?  I NEED logging data to find out what
> the hell is going on here.  The silent death treatment is driving me nuts.
>  Sorry, I am frustrated and this indexing is literally the last step to
> bringing a production system on line for its tests.  If I can't get these
> indexes built, couchdb will have been a complete failure for me after weeks
> of dev to convert a system to use it.
>
> <help?>
>
> Glenn
>
> On Tue, Oct 6, 2009 at 8:53 AM, Glenn Rempe <glenn@rempe.us> wrote:
>
>> Would replicating the DB to the same host perform those checks?  Also, if I
>> setup the auto-index every X # of records script shown on the wiki would
>> that be run on indexing?  These two combined might allow me to essentially
>> scan and check the records as migrated, and build indexes incrementally from
>> the get go.
>> Is there another way to run a scan for 'invalid' records across the whole
>> db?  Could I write a script to loop through all records?  And if I did what
>> would I be looking for?  That the JSON parses?  What else?
>>
>> Also, a new data point.  Last night before going to bed I split up my 7
>> views which were all in one design doc into 4 docs (1 x 1view, 3 x 2views).
>>  I started indexing one of them last night and this morning its still
>> running.  Its a simple view with a map and reduce:
>>
>> function(doc) {
>>   if( (doc['couchrest-type'] == 'SearchDocument') && doc.engine) {
>>     emit(doc.engine, 1);
>>   }
>> }
>>
>> function(keys, values, rereduce) {
>>   return sum(values);
>> }
>>
>> Processed 6570762 of 28249510 changes (23%)
>>
>> 6 million records is higher than I gotten on previous attempts which seemed
>> to bork at around ~4mm.
>>
>> Strange.
>>
>> G
>>
>> On Tue, Oct 6, 2009 at 6:29 AM, Curt Arnold <carnold@apache.org> wrote:
>>
>>>
>>> On Oct 6, 2009, at 1:46 AM, Glenn Rempe wrote:
>>>
>>>>
>>>> - there is some kind of corruption in the main DB file and when this
>>>> point
>>>> is reached (or a specific record in the DB?) that it crashes? If so how
>>>> can
>>>> I best identify this?
>>>>
>>>
>>> Inserting mal-encoded documents into CouchDB could interfere with document
>>> retrieval and indexing, see
>>> https://issues.apache.org/jira/browse/COUCHDB-345.  Possibly one of those
>>> got into your database and now it is stopping the rebuilding of views.  A
>>> patch recently got added to prevent mal-encoded documents from being
>>> accepted, but it does not fix the problem on an existing database that has
>>> been corrupted.   I do not know if the symptoms are the same as what you are
>>> observing, but I think it would be a likely culprit.
>>>
>>
>>
>>
>> --
>> Glenn Rempe
>>
>> email                 : glenn@rempe.us
>> voice                 : (415) 894-5366 or (415)-89G-LENN
>> twitter                : @grempe
>> contact info        : http://www.rempe.us/contact.html
>> pgp                    : http://www.rempe.us/gnupg.txt
>>
>>
>
>
> --
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt
>

Glenn,

The quickest way to check if you have a bad document in your DB would
probably be something like:

$ ps ax | grep beam.smp
$ curl http://127.0.0.1:5984/db_name/_all_docs?include_docs=true > /dev/null
$ ps ax | grep beam.smp

You only need to trigger the doc to exit through the JSON serializer
to trigger the badness.

If its being restarted by heart, then its most likely a complete VM
death. The fact the PID is changing suggests that you're hitting VM
death. And on complete VM death there is nothing CouchDB can do to
help. VM deaths are instant and dramatic. Have you tried checking
memory allocated the beam.smp process as it gets further along? A
common cause of instant VM deaths is when malloc returns NULL.

Also, I just went through and re-read the entire discussion. After
your 0.9.1 -> trunk upgrade did you compact the database? I can't
think of anything that'd cause an issue there but it might be
something to try (there is a conversion process during compaction).

If the db dump and compaction don't show anything then we'll take a
look at writing some scripts to go through and check docs and add some
reporting to the view generation process to try and get a handle on
what's going on.

Paul Davis

Mime
View raw message