incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: btree refactoring
Date Tue, 08 Mar 2011 20:03:58 GMT
On Tue, Mar 8, 2011 at 2:28 PM, Randall Leeds <randall.leeds@gmail.com> wrote:
> When I start hacking on COUCHDB-1076 a couple nights ago I found that the
> cleanest way I could see for allowing kp_nodes to be skipped during
> traversal was for the fold function to couch_btree:foldl to be arity 4, with
> kp_node | kv_node as the first argument.
>
> Evaluating that function for every node (instead of just kv_nodes) lets us
> return {skip, Acc} to skip whole subtrees.
> Diving back into the couch_btree code to read over Damien's patch for
> COUCHDB-1084, it hit me that ModifyFun could instead be a function that was
> called for all the actions. We wouldn't need to return the query results
> because this action function could send them back to the client as they're
> found. Then we just keep the actions list as part of the accumulator and
> query_modify collapses into a btree:foldl and we no longer need so many
> similar-looking recursive functions. Off the cuff I envision modify_node
> being exported and simplified to be non-recursive and query_modify being a
> helper function to generate a fold function that uses modify_node (or
> something like this).
>
> Is this similar to anything you've done already, Paul? Would you all be
> interested if I took a crack at doing this kind of refactor?
>

If we want to talk about refactoring, I think the first place would be
to go back and finish up getting [1] into trunk. I almost got to it
one night but had a failing test and then was distracted by something
shiny.

I'd have to say I'm fairly against making the fold function use a
single function that does double duty on handling kp and kv nodes. It
just strikes me as not very elegant, not to mention, imagine rewriting
the httpd view functions with such a construct.

Though riffing off your idea, I could see passing a second function
that takes a Red/Acc pair (the same Acc as the normal fold fun) and
then uses that information to do whatever it is it needs to do. The
separation of concerns here seems quite a bit more elegant. Though I'd
also have a bit of concern about how the various edge cases interact,
ie, do we use the SkipFun before we reach the start key? Also, how
does this interact with reduce folds? Supported? Not supported? Etc
etc.


As to munging stream/query_modify into a single function that uses a
ModifyFun or some such also concerns me a bit. I would prefer to keep
them separate to make my brain hurt a bit less when considering
things. Especially to avoid errors like "omg! we lost docs cause of a
bug in handling _all_docs traversal" (which would become not
impossible).

Though I did already mention that we might want to reconsider changing
the ModifyFun idea from 1084 to be a function that's passed and
evaluated for each (Key, OldVal, NewVal) triple. Whether we go ahead
and do that for all actions and then make anyone wanting to use the
lookup part of query_modify use an Acc to gather the interesting bits
I'm not certain.

I can already say, that if we did go that route, it'd probably end up
easiest to change all uses of query_modify to have a signature of
something like couch_btree:modify(Bt, Keys, ModifyFun, InitAcc) and
then in the node handling for the modify function it'd allow people to
do whatever they wanted. The thing about this method is that it could
theoretically handle our current situation directly to involve zero
rewrites to existing usage of btree's yet allow us to systematically
go through and make similar sorts of improvements that Damien's
proposed in 1084.


[1] https://github.com/davisp/couchdb/blob/btree/src/couchdb/couch_btree.erl

Mime
View raw message