incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kowsik <kow...@gmail.com>
Subject Re: CouchDB 1.1 issue
Date Fri, 02 Sep 2011 02:11:20 GMT
Wow, I'm shocked by the eerie silence on this. So I take it, there are
no clues in my prior emails to figure out why the replicator is
backing up and then dumping a 500,000 line stack trace?

Dunno if it helps, but here's what we see. The number of documents
between the two clusters will start to differ (meaning things are not
replicating fast enough) and then we'll see 100% CPU utilization one
of the them at the same time watching the memory utilization grow.
Could it be the geo-latency that's causing the problem?

Just to see if it makes a difference, we are moving our CouchDB
cluster to an m2.2xlarge instance (big honking instance with fast IO)
as well as using instance storage instead of EBS. Will report back on
what we see. But we definitely could use some help here.

Thanks,

K.
---
http://blitz.io
@pcapr

On Thu, Sep 1, 2011 at 7:29 AM, kowsik <kowsik@gmail.com> wrote:
> One more observation. It seems the memory goes up dramatically while
> the replicator task is writing all the failed-to-replicate-docs to the
> log (ends with this)
>
> ** Reason for termination ==
> ** {http_request_failed,<<"failed to replicate http://host/db">>}
>
> Is there a way to disable logging for the replicator? Interestingly
> enough, as soon as we restart, the replicator simply catches up and
> pretends there were no problems.
>
> K.
> ---
> http://blog.mudynamics.com
> http://blitz.io
> @pcapr
>
> On Thu, Sep 1, 2011 at 7:18 AM, kowsik <kowsik@gmail.com> wrote:
>> Right before I sent this email we restarted CouchDB and now it's at
>> 14% memory usage and climbing. Is there anything we can look at
>> stats-wise and see where the pressure in the system is? I realize task
>> stats are being added to trunk, but on 1.1, anything?
>>
>> Thanks,
>>
>> K.
>> ---
>> http://blog.mudynamics.com
>> http://blitz.io
>> @pcapr
>>
>> On Thu, Sep 1, 2011 at 6:35 AM, Scott Feinberg <feinberg.scott@gmail.com> wrote:
>>> I haven't had that issue-though I'm not using using 1.1 in a
>>> production environment, just using it to replicate like crazy (millions of
>>> docs in each of my 20+ databases).  I was running a server with 1 GB of
>>> memory and didn't have an issue, it handled it fine.
>>>
>>> However... from http://docs.couchbase.org/couchdb-release-1.1/index.html
>>>
>>> When you PUT/POST a document to the _replicator database, CouchDB will
>>> attempt to start the replication up to 10 times (configurable under
>>> [replicator], parameter max_replication_retry_count).
>>>
>>> Not sure if that helps.
>>>
>>> --Scott
>>>
>>> On Thu, Sep 1, 2011 at 9:28 AM, kowsik <kowsik@gmail.com> wrote:
>>>
>>>> Ran into this twice so far in production CouchDB in the last two days.
>>>> We are running CouchDB 1.1 on an EC2 AMI with multi-master replication
>>>> across two regions. I notice that every now and then CouchDB will
>>>> simply suck up 100% CPU 50% of the total memory and not respond at
>>>> all. So far the logs only show sporadic replication errors. One of the
>>>> stack traces (failed to replicate after 10 times) is about 500,000
>>>> lines long. We are using the _replicator database.
>>>>
>>>> Anyone else running into this? Since 1.1 doesn't have the
>>>> try-until-infinity-and-beyond mode, we have a worker task that watches
>>>> the _replication_state and kicks the replicator as soon as it errors
>>>> out. Are there any settings in terms replicator memory usage, etc that
>>>> could help us?
>>>>
>>>> Thanks!
>>>>
>>>> K.
>>>> ---
>>>> http://blog.mudynamics.com
>>>> http://blitz.io
>>>> @pcapr
>>>>
>>>
>>
>

Mime
View raw message