From dev-return-4600-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Fri Jun 12 15:06:58 2009 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 52812 invoked from network); 12 Jun 2009 15:06:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jun 2009 15:06:57 -0000 Received: (qmail 30447 invoked by uid 500); 12 Jun 2009 15:07:03 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 30261 invoked by uid 500); 12 Jun 2009 15:07:03 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 30169 invoked by uid 99); 12 Jun 2009 15:06:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jun 2009 15:06:56 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=FS_REPLICA,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adam.kocoloski@gmail.com designates 209.85.221.171 as permitted sender) Received: from [209.85.221.171] (HELO mail-qy0-f171.google.com) (209.85.221.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jun 2009 15:06:45 +0000 Received: by qyk1 with SMTP id 1so1194364qyk.13 for ; Fri, 12 Jun 2009 08:06:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=bii0FUK1wHIylqIixc4D9S7RYq5bdXZU0gNbJwUnf54=; b=eMgiRvGP5p5zBW7qbCjHOD4AUwH3O4jw9ghVYtIJnXSDiSh4RlPieEpWhHymTW/Jzx /MLVl7CfJwPJjPWMrtHI1sOW9KAvHevf2HK0ZpGv5bGobIwtCSTg65j+B3xzy5a7AQMT axLsnx5okhV3AsSDaGvYiO6CICiKViZjGeasw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=QbDf5WkP9MMyhLkFdrxGW/5J25XbOUTFr6oDT5QHfEo+nUFYnl5zGaXFRHLR2SK+Wg w5wmVAaThvN5lw6jN22K5HAbsjSySil5QWHKLT8Hp/XJmox1DQN/xPHPuzLLZpmLH09q 8oqO+ZFGkcEol8MCVqzE6GlX1MOVLxvRh1JPc= Received: by 10.224.11.136 with SMTP id t8mr4458274qat.33.1244819184952; Fri, 12 Jun 2009 08:06:24 -0700 (PDT) Received: from ?10.0.1.2? (c-66-31-20-188.hsd1.ma.comcast.net [66.31.20.188]) by mx.google.com with ESMTPS id 5sm1032279qwh.51.2009.06.12.08.06.23 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 12 Jun 2009 08:06:24 -0700 (PDT) Sender: Adam Kocoloski Message-Id: <72FAB380-3FAC-408A-927F-95CC15403992@apache.org> From: Adam Kocoloski To: dev@couchdb.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: replication using _changes API Date: Fri, 12 Jun 2009 11:06:21 -0400 References: <56A7CF26-8B1D-4D98-A122-B5A77A55F337@apache.org> <5E34D6F6-51D1-43A3-96DA-8CAC00883A56@apache.org> <19EA47F1-BD09-4B3E-9036-165F8AED0374@apache.org> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org On Jun 12, 2009, at 10:59 AM, Paul Davis wrote: > On Fri, Jun 12, 2009 at 10:47 AM, Damien Katz > wrote: >> >> >> On Jun 12, 2009, at 8:59 AM, Adam Kocoloski wrote: >> >>> Hi Damien, I'm not sure I follow. My worry was that, if I built a >>> replicator which only queried _changes to get the list of updates, >>> I'd have >>> to be prepared to process a very large response. I thought one >>> smart way to >>> process this response was to throttle the download at the TCP >>> level by >>> putting the socket into passive mode. >> >> You will have a very large response, but you can stream it, >> processing one >> line at a time, then you discard the line and process the next. As >> long as >> the writer is using a blocking socket and the reader is only >> reading as much >> data as necessary to process a line, you never need to store much >> of the >> data in memory on either side. But it seems the HTTP client is >> buffering the >> data as it comes in, perhaps unintentionally. >> >> With TCP, the sending side will only send so much data before >> getting an >> ACK, acknowledgment that packets sent were actually received. When >> an ACK >> isn't received, the sender stops sending, and the TCP calls will >> block at >> the sender (or return an error if the socket is in non-blocking >> mode), until >> it gets a response or socket timeout. >> >> So if you have a non-buffering reader and a blocking sender, then >> you can >> stream the data and only relatively small amounts of data are >> buffered at >> any time. The problem is the reader in the HTTP client isn't >> waiting for the >> data to be demanded at all, instead as soon as data comes in, it >> sends it to >> a receiving erlang process. Erlang processes never block to receive >> messages, so there is no limit to the amount of data buffered. So >> if the >> Erlang process can't process the data fast enough, it starts getting >> buffered in it's mailbox, consuming unlimited memory. >> >> Assuming I understand the problem correctly, the way to fix it is >> to have >> the HTTP client not read the data until it's demanded by the >> consuming >> process. Then we are only using the default TCP buffers, not the >> Erlang >> message queues as a buffer, and the total amount of memory used at >> anytime >> is small. >> > > Dunno about HTTP clients, but when I was playing around with gen_tcp a > week or two ago I found a parameter to opening a socket that is > something like {active, false} that affects this specific > functionality. Active sockets send tcp data as Erlang messages, > inactive sockets don't and you have to get the data with > gen_tcp:recv(Sock). > > I haven't the foggiest if the HTTP bits expose any of that though. As far as I can tell, the {stream,{self,once}} translates to an inet:setopts(socket(), [{active,once}]), which accomplishes the same basic goal as {active,false}, just with repeated calls to setopts(Sock, [{active,once}]) instead of gen_tcp:recv(Sock). I must be missing something, though, because clearly I'm getting more messages than I asked for. I'm sure I could cook up something simple using gen_tcp directly, but even I'll have to deal with authentication, ssl, etc. so I'd prefer to use a full-fledged HTTP client if I can get it to work. Best, Adam