Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 46461 invoked from network); 23 Oct 2009 18:30:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Oct 2009 18:30:10 -0000 Received: (qmail 59738 invoked by uid 500); 23 Oct 2009 18:30:09 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 59645 invoked by uid 500); 23 Oct 2009 18:30:09 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 59633 invoked by uid 99); 23 Oct 2009 18:30:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 18:30:09 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of norman.barker@gmail.com designates 209.85.211.186 as permitted sender) Received: from [209.85.211.186] (HELO mail-yw0-f186.google.com) (209.85.211.186) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 18:30:07 +0000 Received: by ywh16 with SMTP id 16so7567584ywh.13 for ; Fri, 23 Oct 2009 11:29:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=SBWcFAzOqq0OPcY/wLCoQjgYASIXZvCDHP4IM6SSNnk=; b=ovcdJV+OIfOi+U6D5fdqrzaLd6FyEIvYAW1iVW4lOEwhBjmbPjLkUDFXVN9Jrk9UYR 5lZRJsRYrLyMVtzB7WGNRh3T6/po8+qSqEBvQgAcP2iUgSquY1zM22RhxZEfz1CBw7gF 4vXqmkIp5xqaslp/UJ0jqX5/BwgyYUG5BHgrs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=KlX8nqypFAaGtX/UUuSjMWmVDdwVwBBJ+vzXQvCnp7p9uLG3AENhHWjBFfcNsZAc+I GTaE4AmiVKIO31qEN66anw/JuWQLStx40892151YCgUya0VJHi+Eh3txAPQn4MCTKan4 GQfnY9Q6FoULHsnq9897aYsCzGH4FinJKiOGU= MIME-Version: 1.0 Received: by 10.150.250.13 with SMTP id x13mr5176698ybh.230.1256322585874; Fri, 23 Oct 2009 11:29:45 -0700 (PDT) In-Reply-To: References: Date: Fri, 23 Oct 2009 12:29:45 -0600 Message-ID: Subject: Re: chunked response and couch_doc_open From: Norman Barker To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Fri, Oct 23, 2009 at 12:19 PM, Paul Davis wrote: > On Fri, Oct 23, 2009 at 2:11 PM, Norman Barker = wrote: >> On Fri, Oct 23, 2009 at 11:33 AM, Paul Davis >> wrote: >>> On Fri, Oct 23, 2009 at 1:27 PM, Norman Barker wrote: >>>> Hi, >>>> >>>> is there a way (in Erlang) to open a couchdb document and to iterate >>>> over the document body without having to open up all of the document >>>> in memory? >>>> >>>> I would like to use a chunked response to keep the system having a low >>>> memory overhead. >>>> >>>> Not a particular couch question, is there a method in erlang to find >>>> the size (as in number of bytes) of a particular term? >>>> >>>> many thanks, >>>> >>>> Norman >>>> >>> >>> Norman, >>> >>> Well, for document JSON we store Erlang term binaries on disk so >>> there's no real way to stream a doc across the wire from disk without >>> loading the whole thing into RAM. Have you noticed CouchDB having >>> memory issues on read loads? Its generally pretty light on its memory >>> requirements for reads. >>> >>> The only way to get the size of a Term in bytes that I know of is the >>> brute force: size(term_to_binary(Term)) method. >>> >>> Paul Davis >>> >> >> I am sending sizeable JSON documents (a couple of mb), as this scales >> by X concurrent users then the problem grows. I have crashed erlang >> when the process gets up to about a 1gb of memory. =A0(Note, this was on >> windows) The workaround is to increase the memory allocation. >> >> Erlang (and couchdb) is fantastic in that it is so light to run as >> opposed to a J2EE server, streaming documents out would be good >> optimisation. Running a couchdb instance in < 30mb of memory space >> would be my ideal. >> >> If you can point me in the right direction then this is something I >> can contribute back, most of my erlang code so far has been specific >> to my application. >> >> Many thanks, >> >> Norman >> > > Norman, > > Streaming JSON docs in and out would require massive amounts of work > in rewriting lots of the core of CouchDB. Right down to making the > JSON parsers stream oriented. I'm not even sure where you'd get > started on such an undertaking. > > Though there was a bug reported earlier today with Windows doing weird > things with retaining memory for _bulk_docs calls, I wonder if there's > a connection. > > Paul Davis > Paul, I was thinking that perhaps this could be done at the mochijson2 level, and wonder if on the way out if there was an iterator approach that could be used within mochijson, but perhaps this impacts the format of the disk storage within couchdb. Certainly it is an optimisation, but without it does limit scalability and the premise of running on low commodity hardware. No criticism intended, I will be looking at this at some point. Norman