From dev-return-4235-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Sat May 16 15:07:52 2009 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 44325 invoked from network); 16 May 2009 15:07:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 May 2009 15:07:52 -0000 Received: (qmail 20682 invoked by uid 500); 16 May 2009 15:07:52 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 20607 invoked by uid 500); 16 May 2009 15:07:51 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 20597 invoked by uid 99); 16 May 2009 15:07:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 May 2009 15:07:51 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=FS_REPLICA,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adam.kocoloski@gmail.com designates 74.125.92.26 as permitted sender) Received: from [74.125.92.26] (HELO qw-out-2122.google.com) (74.125.92.26) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 May 2009 15:07:40 +0000 Received: by qw-out-2122.google.com with SMTP id 5so1679376qwd.29 for ; Sat, 16 May 2009 08:07:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=RySqZDoFMES9a19jwDKvC3r3aFTqdmD2EBUu5qX10mk=; b=GMNL8deAtoXWy3JqaeW3SAd6deH35PjNnO/Rp66sOsdlZhqNHaIfwOQ51XmjKCW5Kc t2+r3W9WxbMSUkD9W2GopX4YZwrbMkQgndgPdbo11gb4cbWB646b0s05sQ+BoxWQUe31 tK4Gvn81MSEWXa0ZXIkS+qgQcVZQI6b63jeqA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=odzSd/8S0o1snBADzV8/3nBiWEZ20GBgHDSNj8bPtI6JMf0PfhVzm+BXGOYMprkn6t PJOGh8uoYGjxKw6Z5pV2npTUrZre/tnSzWQSu08n4VJ9cwkJcC69aTDpQUKhoWQWVeRQ qTTgyY7lACf5SEmaSylqYwvzzbNseFJ1pKQSk= Received: by 10.224.32.73 with SMTP id b9mr4997052qad.11.1242486439866; Sat, 16 May 2009 08:07:19 -0700 (PDT) Received: from ?10.0.1.2? (c-66-31-20-188.hsd1.ma.comcast.net [66.31.20.188]) by mx.google.com with ESMTPS id 26sm3732067qwa.28.2009.05.16.08.07.18 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 16 May 2009 08:07:19 -0700 (PDT) Sender: Adam Kocoloski Message-Id: From: Adam Kocoloski To: dev@couchdb.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: Attachment Replication Problem - Bug Found Date: Sat, 16 May 2009 11:07:16 -0400 References: <12669510-BD0F-45F3-8AC6-19872DF9071D@gmail.com> <25F55CAE-C44E-4416-AF39-C9037BA099FE@gmail.com> <9F12A09E-64D7-4703-A295-81B205655300@gmail.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org Hi Antony, On May 16, 2009, at 10:39 AM, Antony Blakey wrote: > I can confirm that the target and source of replicated resources > affected by this issue are identical with this fix, and both are > correct i.e. uncorrupted, although this is only according to the > failures I've seen. Thanks! Makes me feel better, at least. >> Now, on to the checkpointing conditions. I think there's some >> confusion about the attachment workflow. Attachments are >> downloaded _immediately_ and in their entirety by ibrowse, which >> then sends the data as 1MB binary chunks to the attachment receiver >> processes. > > Are they downloaded to disk by ibrowse? No, I don't believe so. ibrowse accepts a {stream_to, pid()} option. It accumulates packets until it reaches a threshold configurable by {stream_chunk_size, integer()} (default 1MB), then sends the data to the Pid. I don't think ibrowse is writing to disk at any point in the process. We do see that when streaming really large attachments, ibrowse becomes the biggest memory user in the emulator. ibrowse does offer a {save_response_to_file, boolean()|filename()} option that we could possibly leverage. >> In another thread Matt Goodall suggested checkpointing after a >> certain amount of time has passed. So we'd have a checkpointing >> algo that considers >> >> * memory utilization >> * number of pending writes >> * time elapsed > > That seems to cover both resource usage and incremental progress. As > far as the couch_util:should_flush mechanism is concerned, I think a > good idea would be to commit 1 document, then 2, then 4 i.e. a > binary increasing window which adapts well to both unreliable and > reliable connections without requiring configuration, which is > tricky because you may want to run the system in a variety of > scenarios, and you might not know what the failure characteristics > are (and they may change over time). It sounds like a good idea. I had thought about doing the same for the process that pulls new docs from the source server, so that we could do a better job of filling up the pipes when we're dealing with the common case of small documents without significant attachment data. > While we on this - any idea about why couchdb is quiting during > replication? It's not giving me any errors. Errm, no, I'm afraid I don't have any idea there. I remember one or two other reports in JIRA that sounds similar, but I've not been able to reproduce them. Are you keeping an eye on the memory usage? I think an out of memory error can trigger this sudden death in Erlang. Sorry, that's the best I've got at the moment. Adam