couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Replication of attachment is extremely slow
Date Fri, 24 Jan 2014 18:18:11 GMT
If you can duplicate this the first thing I'd look at during a slow
replication is "sudo netstat -tanp tcp" to see if you're maybe bumping
up against open socket limits.

On Fri, Jan 24, 2014 at 7:40 AM, Scott Weber <> wrote:
> I appreciate the digging, but in the case of the test file we were using, it is some
text that doesn't have dashes or newlines, mixed with image data which are big binary blobs.
> So strings that look like mime boundaries aren't likely to be present.
> -Scott
> ----- Original Message -----
> From: Nick North <>
> To: "" <>;
> Cc:
> Sent: Friday, January 24, 2014 9:28 AM
> Subject: Re: Replication of attachment is extremely slow
> On 24 January 2014 15:01, Jens Alfke <> wrote:
>> On Jan 24, 2014, at 5:06 AM, Nick North <> wrote:
>> > I'm not really expecting this problem to be the cause of the slowdown:
>> > the attachment needs to contain a lot of initial prefixes of the MIME
>> > boundary string for things to be really bad.
>> This is on the reading side, where the MIME parser is looking for the
>> boundary string that signals the end of the attachment part?
>> But the boundary string has to appear after a CRLF, so the actual sequence
>> to search for starts with "\r\n--". I'd expect the slowdown to happen only
>> if the data contains a lot of those sequences, not just any old hyphens.
>> (Also, that search is really slow enough to be noticeable?! Doesn't Erlang
>> have a native string-search primitive?)
>> —Jens
>> PS: Maybe we should move this thread to the new replication mailing list :)
> Copied to the replication list (though not with all the preceding posts
> including, with their top and bottom posting).
> I don't have the code in front of me, but what you say about the search
> string sounds right, so apologies for the error. However, that makes things
> worse: the current code searches each 4KB block of the attachment for any
> initial prefix of the boundary sequence. If it finds a prefix, but not the
> whole string, it passes the block up to that point through, and starts
> searching again from about the place where the prefix was found, on the
> remainder of the original block, plus the next 4KB appended to the end. So,
> if the boundary sequence begins with "\r", then every occurrence of "\r"
> will slow it down, by causing boundary sequence searching to start again
> from where it occurs, with a larger piece of attachment to search. "\r" is
> probably more common than "-", making the problem more likely to pop up.
> Nick

View raw message