couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norman Barker <>
Subject Re: chunked response and couch_doc_open
Date Fri, 23 Oct 2009 18:29:45 GMT
On Fri, Oct 23, 2009 at 12:19 PM, Paul Davis
<> wrote:
> On Fri, Oct 23, 2009 at 2:11 PM, Norman Barker <> wrote:
>> On Fri, Oct 23, 2009 at 11:33 AM, Paul Davis
>> <> wrote:
>>> On Fri, Oct 23, 2009 at 1:27 PM, Norman Barker <>
>>>> Hi,
>>>> is there a way (in Erlang) to open a couchdb document and to iterate
>>>> over the document body without having to open up all of the document
>>>> in memory?
>>>> I would like to use a chunked response to keep the system having a low
>>>> memory overhead.
>>>> Not a particular couch question, is there a method in erlang to find
>>>> the size (as in number of bytes) of a particular term?
>>>> many thanks,
>>>> Norman
>>> Norman,
>>> Well, for document JSON we store Erlang term binaries on disk so
>>> there's no real way to stream a doc across the wire from disk without
>>> loading the whole thing into RAM. Have you noticed CouchDB having
>>> memory issues on read loads? Its generally pretty light on its memory
>>> requirements for reads.
>>> The only way to get the size of a Term in bytes that I know of is the
>>> brute force: size(term_to_binary(Term)) method.
>>> Paul Davis
>> I am sending sizeable JSON documents (a couple of mb), as this scales
>> by X concurrent users then the problem grows. I have crashed erlang
>> when the process gets up to about a 1gb of memory.  (Note, this was on
>> windows) The workaround is to increase the memory allocation.
>> Erlang (and couchdb) is fantastic in that it is so light to run as
>> opposed to a J2EE server, streaming documents out would be good
>> optimisation. Running a couchdb instance in < 30mb of memory space
>> would be my ideal.
>> If you can point me in the right direction then this is something I
>> can contribute back, most of my erlang code so far has been specific
>> to my application.
>> Many thanks,
>> Norman
> Norman,
> Streaming JSON docs in and out would require massive amounts of work
> in rewriting lots of the core of CouchDB. Right down to making the
> JSON parsers stream oriented. I'm not even sure where you'd get
> started on such an undertaking.
> Though there was a bug reported earlier today with Windows doing weird
> things with retaining memory for _bulk_docs calls, I wonder if there's
> a connection.
> Paul Davis

I was thinking that perhaps this could be done at the mochijson2
level,  and wonder if on the way out if there was an iterator approach
that could be used within mochijson, but perhaps this impacts the
format of the disk storage within couchdb. Certainly it is an
optimisation, but without it does limit scalability and the premise of
running on low commodity hardware. No criticism intended, I will be
looking at this at some point.


View raw message