couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB
Date Thu, 05 Dec 2013 20:49:36 GMT


Adam Kocoloski commented on COUCHDB-1946:

[~dch] was kind enough to allow me onto his VM this afternoon to poke around.  Thanks Dave!
 Here's what I found:

# The memory utilization is all taken up in refc binaries, not processes.
# The binaries are mostly attached to couch_stream processes.
# Adding a hibernate after each write by the stream process goes a _long_ way towards stabilizing
memory usage.

For posterity here's how you sum up the size of all binaries attached to a process P:

BinMem = fun(P) -> case  process_info(P, binary) of {binary, Bins} ->  lists:sum([Size
|| {_, Size, _} <- Bins]); _ -> 0 end end.

and here's the diff against 1.5.0 to cause couch_stream to hibernate after each write:

diff --git a/src/couchdb/couch_stream.erl b/src/couchdb/couch_stream.erl
index 959feef..4067ff7 100644
--- a/src/couchdb/couch_stream.erl
+++ b/src/couchdb/couch_stream.erl
@@ -255,7 +255,7 @@ handle_call({write, Bin}, _From, Stream) ->
-                        identity_len=IdenLen + BinSize}};
+                        identity_len=IdenLen + BinSize}, hibernate};
     true ->
         {reply, ok, Stream#stream{

Adding that patch _will_ cause an increase in CPU consumption when writing attachments.  There
may well be more subtle changes (e.g. playing with the {{fullsweep_after}} option when starting
the streamer) that could achieve stability with fewer CPU cycles.

I should also note that while memory usage is far more stable it is still sitting at 2.2 GB
RES right now and seems to be gradually climbing over time, so don't go replicating to a t1.micro
instance just yet.  I think we do have a relatively good understanding of what's going on
at this point, though.

> Trying to replicate NPM grinds to a halt after 40GB
> ---------------------------------------------------
>                 Key: COUCHDB-1946
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Marc Trudel
>         Attachments: couch.log
> I have been able to replicate the Node.js NPM database until 40G or so, then I get this:
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of the log output
at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to restart
replication from scratch - twice - bot cases stalling at 40GB.

This message was sent by Atlassian JIRA

View raw message