couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Cottlehuber (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1946) Trying to replicate NPM grinds to a halt after 40GB
Date Mon, 09 Dec 2013 09:55:07 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843018#comment-13843018
] 

Dave Cottlehuber commented on COUCHDB-1946:
-------------------------------------------

[~stelcheck] agreed
[~thor.lange]

There's something with replicating this specific doc that seems to trigger issues. Here's
what I used to identify it (call source db and use since= <checkpoint -1)

    http://isaacs.iriscouch.com/registry/_changes\?limit\=2\&since\=701251

here's some things you can try:

# option 1

-  delete all existing replications
- compact your DB if there's a big difference between data size and on-disk size. jq is awesome
for this.

curl -s http://localhost:5984/registry | jq ' (.disk_size| tonumber) - (.data_size |tonumber)'

    http://stedolan.github.io/jq/

This is a good spot to copy the registry.couch file if you have space, in case you need to
revert back to it.

-  replicate the single failing document by POSTing this to _replicator. This could take a
*while*.

{{code}}
{
   "source": "http://isaacs.iriscouch.com/registry",
   "target": "registry",
   "doc_ids": [
       "as-stream"
   ],
   "owner": "admin",
   }
}
{{code}}

- this is simply replicating the single stuck document. If you do this, I would love an ngrep
or tcpdump of the traffic to see what happens on the wire during these stuck transfers

- once this is completed, you can then run the normal replication again.

# option 2

Install an older release of CouchDB and see if it doesn't get stuck here:

https://archive.apache.org/dist/couchdb/binary/win/1.2.2/

If you *can* please try the R15B03-1 release first, report back, and then the R14B04 one.
It's not yet clear to me if the issue we are seeing is also related to garbage collection
differences in Erlang/OTP between releases, or solely within CouchDB.

# option 3

Sometime later (hopefully today), I should have a bitttorrent accessible version of npm. I
need to update & compact first, this is pretty much IO limited :-).


> Trying to replicate NPM grinds to a halt after 40GB
> ---------------------------------------------------
>
>                 Key: COUCHDB-1946
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1946
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Marc Trudel
>         Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of the log output
at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to restart
replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message