couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: replication using _changes API
Date Fri, 12 Jun 2009 15:00:36 GMT
On Jun 12, 2009, at 10:47 AM, Damien Katz wrote:

> On Jun 12, 2009, at 8:59 AM, Adam Kocoloski wrote:
>
>> Hi Damien, I'm not sure I follow.  My worry was that, if I built a  
>> replicator which only queried _changes to get the list of updates,  
>> I'd have to be prepared to process a very large response.  I  
>> thought one smart way to process this response was to throttle the  
>> download at the TCP level by putting the socket into passive mode.
>
> You will have a very large response, but you can stream it,  
> processing one line at a time, then you discard the line and process  
> the next. As long as the writer is using a blocking socket and the  
> reader is only reading as much data as necessary to process a line,  
> you never need to store much of the data in memory on either side.  
> But it seems the HTTP client is buffering the data as it comes in,  
> perhaps unintentionally.
>
> With TCP, the sending side will only send so much data before  
> getting an ACK, acknowledgment that packets sent were actually  
> received. When an ACK isn't received, the sender stops sending, and  
> the TCP calls will block at the sender (or return an error if the  
> socket is in non-blocking mode), until it gets a response or socket  
> timeout.
>
> So if you have a non-buffering reader and a blocking sender, then  
> you can stream the data and only relatively small amounts of data  
> are buffered at any time. The problem is the reader in the HTTP  
> client isn't waiting for the data to be demanded at all, instead as  
> soon as data comes in, it sends it to a receiving erlang process.  
> Erlang processes never block to receive messages, so there is no  
> limit to the amount of data buffered. So if the Erlang process can't  
> process the data fast enough, it starts getting buffered in it's  
> mailbox, consuming unlimited memory.
>
> Assuming I understand the problem correctly, the way to fix it is to  
> have the HTTP client not read the data until it's demanded by the  
> consuming process. Then we are only using the default TCP buffers,  
> not the Erlang message queues as a buffer, and the total amount of  
> memory used at anytime is small.
>
> -Damien

Yes, we're definitely in agreement, just using different language.  I  
was trying to do exactly what you describe, but the HTTP client I was  
using seemed to be ignoring my request to switch the socket to passive  
mode (i.e. turn off buffering).  Best,

Adam

Mime
View raw message