couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Weber <scotty2...@sbcglobal.net>
Subject Re: Replication of attachment is extremely slow
Date Fri, 24 Jan 2014 15:40:27 GMT
I appreciate the digging, but in the case of the test file we were using, it is some text that
doesn't have dashes or newlines, mixed with image data which are big binary blobs.

So strings that look like mime boundaries aren't likely to be present.

-Scott




----- Original Message -----
From: Nick North <north.n@gmail.com>
To: "user@couchdb.apache.org" <user@couchdb.apache.org>; replication@couchdb.apache.org
Cc: 
Sent: Friday, January 24, 2014 9:28 AM
Subject: Re: Replication of attachment is extremely slow

On 24 January 2014 15:01, Jens Alfke <jens@couchbase.com> wrote:

>
> On Jan 24, 2014, at 5:06 AM, Nick North <north.n@gmail.com> wrote:
>
> > I'm not really expecting this problem to be the cause of the slowdown:
> > the attachment needs to contain a lot of initial prefixes of the MIME
> > boundary string for things to be really bad.
>
> This is on the reading side, where the MIME parser is looking for the
> boundary string that signals the end of the attachment part?
> But the boundary string has to appear after a CRLF, so the actual sequence
> to search for starts with "\r\n--". I'd expect the slowdown to happen only
> if the data contains a lot of those sequences, not just any old hyphens.
>
> (Also, that search is really slow enough to be noticeable?! Doesn't Erlang
> have a native string-search primitive?)
>
> —Jens
>
> PS: Maybe we should move this thread to the new replication mailing list :)


Copied to the replication list (though not with all the preceding posts
including, with their top and bottom posting).

I don't have the code in front of me, but what you say about the search
string sounds right, so apologies for the error. However, that makes things
worse: the current code searches each 4KB block of the attachment for any
initial prefix of the boundary sequence. If it finds a prefix, but not the
whole string, it passes the block up to that point through, and starts
searching again from about the place where the prefix was found, on the
remainder of the original block, plus the next 4KB appended to the end. So,
if the boundary sequence begins with "\r", then every occurrence of "\r"
will slow it down, by causing boundary sequence searching to start again
from where it occurs, with a larger piece of attachment to search. "\r" is
probably more common than "-", making the problem more likely to pop up.

Nick


Mime
View raw message