incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: chunked response and couch_doc_open
Date Fri, 23 Oct 2009 18:38:54 GMT
On Fri, Oct 23, 2009 at 2:29 PM, Norman Barker <norman.barker@gmail.com> wrote:
> On Fri, Oct 23, 2009 at 12:19 PM, Paul Davis
> <paul.joseph.davis@gmail.com> wrote:
>> On Fri, Oct 23, 2009 at 2:11 PM, Norman Barker <norman.barker@gmail.com> wrote:
>>> On Fri, Oct 23, 2009 at 11:33 AM, Paul Davis
>>> <paul.joseph.davis@gmail.com> wrote:
>>>> On Fri, Oct 23, 2009 at 1:27 PM, Norman Barker <norman.barker@gmail.com>
wrote:
>>>>> Hi,
>>>>>
>>>>> is there a way (in Erlang) to open a couchdb document and to iterate
>>>>> over the document body without having to open up all of the document
>>>>> in memory?
>>>>>
>>>>> I would like to use a chunked response to keep the system having a low
>>>>> memory overhead.
>>>>>
>>>>> Not a particular couch question, is there a method in erlang to find
>>>>> the size (as in number of bytes) of a particular term?
>>>>>
>>>>> many thanks,
>>>>>
>>>>> Norman
>>>>>
>>>>
>>>> Norman,
>>>>
>>>> Well, for document JSON we store Erlang term binaries on disk so
>>>> there's no real way to stream a doc across the wire from disk without
>>>> loading the whole thing into RAM. Have you noticed CouchDB having
>>>> memory issues on read loads? Its generally pretty light on its memory
>>>> requirements for reads.
>>>>
>>>> The only way to get the size of a Term in bytes that I know of is the
>>>> brute force: size(term_to_binary(Term)) method.
>>>>
>>>> Paul Davis
>>>>
>>>
>>> I am sending sizeable JSON documents (a couple of mb), as this scales
>>> by X concurrent users then the problem grows. I have crashed erlang
>>> when the process gets up to about a 1gb of memory.  (Note, this was on
>>> windows) The workaround is to increase the memory allocation.
>>>
>>> Erlang (and couchdb) is fantastic in that it is so light to run as
>>> opposed to a J2EE server, streaming documents out would be good
>>> optimisation. Running a couchdb instance in < 30mb of memory space
>>> would be my ideal.
>>>
>>> If you can point me in the right direction then this is something I
>>> can contribute back, most of my erlang code so far has been specific
>>> to my application.
>>>
>>> Many thanks,
>>>
>>> Norman
>>>
>>
>> Norman,
>>
>> Streaming JSON docs in and out would require massive amounts of work
>> in rewriting lots of the core of CouchDB. Right down to making the
>> JSON parsers stream oriented. I'm not even sure where you'd get
>> started on such an undertaking.
>>
>> Though there was a bug reported earlier today with Windows doing weird
>> things with retaining memory for _bulk_docs calls, I wonder if there's
>> a connection.
>>
>> Paul Davis
>>
> Paul,
>
> I was thinking that perhaps this could be done at the mochijson2
> level,  and wonder if on the way out if there was an iterator approach
> that could be used within mochijson, but perhaps this impacts the
> format of the disk storage within couchdb. Certainly it is an
> optimisation, but without it does limit scalability and the premise of
> running on low commodity hardware. No criticism intended, I will be
> looking at this at some point.
>
> Norman
>

Norman,

Even with a streaming mochijson2, most of the core expects to be
working with 'materialized' documents. Rewriting to stream to disk
would require patches to at least: mochijson2, couch_httpd_*.erl
couch_db.erl, couch_db_updater.erl and well, pretty much all of
CouchDB really.

Its hard to say what's best in terms of app design, but really, the
alternative is quite heavy and fairly unlikely to make it into trunk
anytime soon if ever. Beyond just the making it work part, the amount
of complexity it'd add would most likely be prohibitive at best.

Paul Davis

Mime
View raw message