incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Timeout Error when trying to access views + Indexing problems
Date Tue, 06 Oct 2009 18:28:12 GMT
On Tue, Oct 6, 2009 at 2:21 PM, Glenn Rempe <glenn@rempe.us> wrote:
> Thanks Paul.  Comments below.
>
> On Tue, Oct 6, 2009 at 11:01 AM, Paul Davis <paul.joseph.davis@gmail.com>wrote:
>
>>
>> Glenn,
>>
>> The quickest way to check if you have a bad document in your DB would
>> probably be something like:
>>
>> $ ps ax | grep beam.smp
>> $ curl http://127.0.0.1:5984/db_name/_all_docs?include_docs=true >
>> /dev/null
>> $ ps ax | grep beam.smp
>>
>> You only need to trigger the doc to exit through the JSON serializer
>> to trigger the badness.
>>
>>
> I am running this now.
>
>
>> If its being restarted by heart, then its most likely a complete VM
>> death. The fact the PID is changing suggests that you're hitting VM
>> death. And on complete VM death there is nothing CouchDB can do to
>> help. VM deaths are instant and dramatic. Have you tried checking
>> memory allocated the beam.smp process as it gets further along? A
>> common cause of instant VM deaths is when malloc returns NULL.
>>
>>
> I have kept an eye on the overall system memory usage.  The EC2 XLarge
> instance I am running on has 15GB RAM, and I have never seen the RAM usage
> go over 4-5GB since I switched to XLarge.  Is there a specific command you
> suggest for tracking memory explicitly assigned to the beam?
>

I'm not very high tech here. Top and free generally just to get an
idea. Memory reporting is kinda wonky so I generally only check for
order of magnitude type checking. Though the next time you start an
indexing run a small script that spins and records high water mark
memory allocation to that PID could prove useful if it's a major spike
that causes VM death.

>
>
>> Also, I just went through and re-read the entire discussion. After
>> your 0.9.1 -> trunk upgrade did you compact the database? I can't
>> think of anything that'd cause an issue there but it might be
>> something to try (there is a conversion process during compaction).
>>
>>
> I did not do a compaction.  I can try that.  Unfortunately that probably
> kills another day compacting my 50GB 28mm record DB.  ;-)  But, hey, if it
> helps... :-)
>

Its a possibility is all. Theoretically this is more incremental, so
even if you kick it off and it dies it'll restart part way through
even without a complete run. (Very theoretically as I haven't tried it
yet). Also it'll run just fine in the background.

>
>> If the db dump and compaction don't show anything then we'll take a
>> look at writing some scripts to go through and check docs and add some
>> reporting to the view generation process to try and get a handle on
>> what's going on.
>>
>> Paul Davis
>>
>
> So there is no way to turn on an additional level of debugging in the view
> generation process with the current code?  I noticed that there is a 'tmi'
> logging level in the erlang couchdb code (which I just turned on).  Will
> this help?

A TMI log level is news to me. I've never seen a log macro that uses it.

> Again, thanks.  I know this is my problem, but knowing that there are some
> people willing to lend a hand, and maybe write some code to help identify /
> resolve this is whats keeping me going.  :-)  Much appreciated.  And
> hopefully couchdb will be the better for it in the end.
>
> Glenn
>

Don't worry. I quite dislike not figuring out the cause of anything
that sounds even remotely like a bug in CouchDB.

Paul Davis

Mime
View raw message