couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Weber <scotty2...@sbcglobal.net>
Subject Re: Replication of attachment is extremely slow.. LOGGED INFORMATION
Date Fri, 24 Jan 2014 19:34:37 GMT
We have a duplication of the problem from a cleaned installation.  And there are some interesting
things in the log, but I don't know what they mean, since I am not familiar with the internals
of CouchDB.

I have attached the couch log. I can send the actual file being replicated, but it is about
4Meg.  Too big to make a reasonable attachment.  And I don't think it will be of value to
this issue.

As to the log, it is broken in three sections (tests), which I will outline:
In all cases, it first deletes all databases, then creates them, and adds a doc and attachment.
<1>
The first is where replication was requested via a script HTTP post.
The attachment is uploaded at line 36 - content length 4185522 bytes
The "_replicate" request is PUT at line 49.
Then a bunch of "Minor error in HTTP request" messages appear.  Not sure what that means.
Then starting at line 328 to 329, you can see 5 second gaps as it tries to "GET" from the
"from" database
and a bunch of "New task status" messages are repeated (about 15 of them).
This repeats, showing 5 second pauses until line 740, where it says "POST /to/_ensure_full_commit
201"
TOTAL TIME:  16:03:03 to 16:04:16   One minute, thirteen seconds.

<2>
The next test was done using CURL, starting at line 770, it deletes the databases, and starts
over.
At line 818, there is the PUT request to the _replicator db  (NOTE: This is "_replicator",
not "_replicate", what is the difference?)
There are only 2 "new task status" messages, and the replication is done by at line 911
TOTAL TIME:  16:17:28 to 16:17:30   2 seconds.

<3>
The next test was done using curl as well. It is a repeat of the second test, except the replication
request was PUT to "_replicate" rather than "_replicator", just like the first test.
It starts at line 912, and looks to be identical to everything in test 2.
It took two seconds, there were only 2 "New task status" messages again.

So, the only difference we see is the script used a header that has a different user-agent
(and had a few other minor differences), and posts a replication request JSON which is this:
   {"_id" : "test", 
    "source" : "http://localhost:5984/from",
    "target" : "http://localhost:5984/to", 
    "create_target" : false, 
    "continuous" : false }

Which is slightly more comprehensive than the CURL JSON which is just this:
     {"source":"from","target":"to"}

But these differences should not cause the replication to be 30 times longer, should they?

Any other ideas why one form of replication takes so much longer?

-Scott




----- Original Message -----
From: Paul Davis <paul.joseph.davis@gmail.com>
To: "user@couchdb.apache.org" <user@couchdb.apache.org>; Scott Weber <scotty2541@sbcglobal.net>
Cc: "replication@couchdb.apache.org" <replication@couchdb.apache.org>
Sent: Friday, January 24, 2014 12:18 PM
Subject: Re: Replication of attachment is extremely slow

If you can duplicate this the first thing I'd look at during a slow
replication is "sudo netstat -tanp tcp" to see if you're maybe bumping
up against open socket limits.

On Fri, Jan 24, 2014 at 7:40 AM, Scott Weber <scotty2541@sbcglobal.net> wrote:
> I appreciate the digging, but in the case of the test file we were using, it is some
text that doesn't have dashes or newlines, mixed with image data which are big binary blobs.
>
> So strings that look like mime boundaries aren't likely to be present.
>
> -Scott
>
>
>
>
Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message