Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of kxepal@gmail.com designates
 74.125.82.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <20130429161625.GC15595@translab.its.uci.edu>
References: <20130429062747.GB3218@translab.its.uci.edu>
	<20130429063244.GA15595@translab.its.uci.edu>
	<CA+UPnOUzE-EmqHsY-wG4ONajfBT3dd8kCrUxKyrDzczjvdbR8w@mail.gmail.com>
	<CABvT1DFSGHfjGq0ZK__a7qHkEPHyKJ7VUk=vzo=YSjmEgSvygA@mail.gmail.com>
	<CAL+Y1nv5=TwF0RXOOUegULTXthk7veU__L5QCs-JwHaQP_85Dw@mail.gmail.com>
	<20130429161625.GC15595@translab.its.uci.edu>
Date: Wed, 1 May 2013 15:13:50 +0400
Message-ID: 
 <CAHdjip+SADocqgzQ4CEv6_uFnbkRrJbocJZ8q88yV=mPMycg3g@mail.gmail.com>
Subject: Re: An old database causes couchdb 1.3.0 to crash getting single
 documents
From: Alexander Shorin <kxepal@gmail.com>
To: "user@couchdb.apache.org" <user@couchdb.apache.org>
Content-Type: text/plain; charset=UTF-8

On Mon, Apr 29, 2013 at 8:16 PM, James Marca
<jmarca@translab.its.uci.edu> wrote:
>
> I was able to get couchdb 1.2.x running on this machine, but 1.1.x
> dies on start.
>
> 1.2.x does not compact this db.  I got some quick
> erlang errors, then RAM usage slowly rose to 95% so I killed it.
>
> Other dbs of the same "generation" work fine...I can access them and
> build views and so on. The only difference in the dbs is the data.
> The problem one is the first one I started using, before I decided to
> manually shard the data.  All the dbs identical design docs and all
> that.  My wild ass guess is that it's something I did early on in the
> early batches of model runs polluting the db.
>
> After a week of fighting this (my awareness of the scope of the
> problem built slowly!), I'm thinking it might be easier to just
> re-run my models, and re-generate the data...at least then the problem
> is just CPU time.
>
> Thanks for the advice.
>
> James


Hi James

I can share your pain: for my practice I'd got a lot of broken
databases that acts in similar way. Most of them was under high
concurrent writes load: "deadman's" database receives data via
replications from 2-3 sources + massive bulk updates + triggering
_update handlers and all this goes in same time. And server always was
with *delayed_commits: true*. They fall with various symptoms: from
explicit badrecord_db error in logs or random weird crush reports at
the middle of data processing till forcing CouchDB unstoppable consume
whole system memory. However, I never hit such problems without
delayed_commits. Do you also have this option enabled?

> After a week of fighting this (my awareness of the scope of the
> problem built slowly!), I'm thinking it might be easier to just
> re-run my models, and re-generate the data...at least then the problem
> is just CPU time.

Yes, this is easiest way to workaround. Also better to keep somewhere
backup copy for emerge cases.

--
,,,^..^,,,