couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shorin <kxe...@gmail.com>
Subject Re: CouchDB load spike (even with low traffic)?
Date Tue, 29 Apr 2014 09:34:57 GMT
Hi Marty,

thanks for following up! I see your problem, but what would we need:

1. CouchDB stats graphs and your system disk, network and memory ones.
If you cannot share them in public, feel free to send me in private.
We need to know they are related. For instance, high memory usage may
be caused by uploading high amount of big files: you'll easily notice
that comparing CouchDB, network and memory graphs for the spike
period.

2. CouchDB log entries for spike event. Graphs can only show you
that's something going wrong and we could only guess (almost we guess
right, but without much precise) what's exactly going wrong. Logs will
help to us to find out actual requests that causes memory spike.

After that we can start to think about the problem. For instance, if
spikes are happens due to large attachments uploads, there is no much
to do. On other hand, query server may easily eat quite big chunk of
memory. We'll easily notice that by monitoring /_active_tasks resource
(if problem is in views) or by looking through logs for the spike
period. And this case can be fixed.

Not sure which tools you're using for monitoring and graphs drawing,
but take a look on next projects:
- https://github.com/gws/munin-plugin-couchdb - Munin plugin for
CouchDB monitoring. Suddenly, it doesn't handles system metrics for
CouchDB process - I'll only add this during this week, but make sure
you have similar plugin for your monitoring system.
- https://github.com/etsy/skyline - anomalies detector. spikes are so
- https://github.com/etsy/oculus - metrics correlation tool. it would
be very-very easily to compare multiple graphs for anomaly period with
it.

--
,,,^..^,,,


On Tue, Apr 29, 2014 at 8:15 AM, Marty Hu <marty.hu@gmail.com> wrote:
> We're been running CouchDB v1.5.0 on AWS and its been working fine.
> Recently AWS came out with new prices for their new m3 instances so we
> switched our CouchDB instance to use an m3.large. We have a relatively
> small database with < 10GB of data in it.
>
> Our steady state metrics for it are system loads of 0.2 and memory usages
> of 5% or so. However, we noticed that every few hours (3-4 times per day)
> we get a huge spike that floors our load to 1.5 or so and memory usage to
> close to 100%.
>
> We don't run any cronjobs that involve the database and our traffic flow
> about the same over the day. We do run a continuous replication from one
> database on the west coast to another on the east coast.
>
> This has been stumping me for a bit - any ideas?

Mime
View raw message