Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAHGcXegVvZF+Z2-hqH0uJHt694NBkGTe==WQJ9Zx=N+NRd7Gsw@mail.gmail.com>
References: 
 <CAHGcXegx-rXPE=ktkgUeE+BKXvvNirebD_ENfVoVn-p2VBAG_Q@mail.gmail.com>
	<CAHGcXeg1J_ZJ5-p_3=znz4S8M2sKJu9nZkSpPsX052ObRHEonQ@mail.gmail.com>
	<CAHGcXegVvZF+Z2-hqH0uJHt694NBkGTe==WQJ9Zx=N+NRd7Gsw@mail.gmail.com>
Date: Thu, 6 Dec 2012 23:19:11 +0000
Message-ID: 
 <CABvT1DEyvDpns72UJ22Mbk2s6JavWQqDVtJ26dyDRhfLrCwr6Q@mail.gmail.com>
Subject: Re: growable arrays in reductions
From: Robert Newson <rnewson@apache.org>
To: "user@couchdb.apache.org" <user@couchdb.apache.org>
Content-Type: text/plain; charset=ISO-8859-1

If reduce_limit didn't bite you, and you have plenty of documents,
you're probably fine. It does sound like you're skating near the edge,
though.

The reason for the warning is that intermediate reduce values are
stored in the b+tree, so if they grow, rather than shrink, the b+tree
becomes progressively slower (i.e, we start violating the constraints
that make b+tree's work).

B.

On 6 December 2012 23:08, Will Heger <will.heger@gmail.com> wrote:
> In the end, I could write a list a function, but I do so at the cost
> of caching and incremental update.  For example, I have a grocery cart
> that is described by a series of transactions, items added, items
> removed.  If I wanted to keep a total bill, taxes, item count, in a
> reduction, that would be a pretty canonical reduction along the lines
> of the Event Sourcing design pattern.
>
> My question is whether appending a list of the underlying transaction
> ids would create a problem.
>
> "As a rule of thumb, the data returned by reduce functions should
> remain "smallish" and not grow faster than log(num_rows_processed)."
>
> I'm not totally clear on how to parse this statement.  Is the size
> related to size of mapped input documents?  Collectively or
> individually measured?
>
> Having the underlying id's would allow me to "close-the-loop" from a
> transactions standpoint.  For example, I have ten different clients
> contributing to this one cart.   Any particular client can then
> instantly recognize whether her contribution is factored into the
> summary by scanning for her id within the transaction list.
>
> There are other methods for achieving this, but if this is not going
> to cause a problem, it is presently the most elegant for my
> application.  But beyond this, I'm just interested in what amount of
> growth is allowable.
>
> "From 0.10 onwards, CouchDB uses a heuristic to detect reduce
> functions that won't scale to give the developer an early warning"
>
> So far Couch has not complained to me about any of the reductions I've
> written, but I still feel like I'm flying a bit blind.