couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikeal Rogers <mikeal.rog...@gmail.com>
Subject Re: Proposal for changes in view server/protocol
Date Mon, 26 Jul 2010 21:57:00 GMT
On Mon, Jul 26, 2010 at 5:43 PM, J Chris Anderson <jchris@apache.org> wrote:

>
> On Jul 26, 2010, at 2:35 PM, Mikeal Rogers wrote:
>
> > After some conversations I've had in NYC this week and Mathias' great
> post
> > on the 10 biggest issues with CouchDB (
> >
> http://www.paperplanes.de/2010/7/26/10_annoying_things_about_couchdb.html)
> > I wanted to formally propose some changes to the view server/protocol.
> >
> > The first issue I want to tackle is the lack of CommonJS modules in
> > map/reduce. The reason for this is that we use a deterministic hash on
> all
> > the views in a design document in order to query it.
> >
> > First off, it would be great if we could separate out each view and cache
> it
> > based on it's own hash. This way updating one view doesn't blow away the
> > entire design document. This has some large ramification, for one thing
> it
> > means that each view needs to keep it's own last sequence and while one
> view
> > is getting up to date it can't be included in generation when other views
> > are getting updated.
>
> -1 on splitting views into multiple indexes within a single ddoc. the
> performance gains of batching are too great to ignore.
>
> if you want to split views, put them in their own ddoc.
>

I'm not saying that we stop batching the generation of the views I'm saying
that how we store and cache them should be separated. Then the only thing
you lose is generation performance while one of the views in a ddoc is
re-generating and the *benefit* is that the other views aren't
re-generating.

When all views are up to date they still batch their generation the same way
we just store the btrees and meta info like sequence separately.


> >
> > Once each view has it's own deterministic hash I would propose that we
> move
> > the responsibility for generating the has to a new view server call. This
> > call would get triggered during every design doc update and look
> something
> > like.
> >
> > request : ["hash", {"_id":"_design/foo", .......} ]
> > response ["views/bar","aoivniuasdf8ashd7zh87vxxz87gf8sd7"]
> >
> > The view server can inspect each map/reduce function and determine which
> > modules it imports and include those strings in the hash for that
> particular
> > view.
>
> that is fine by me (even great) but we will need to hash those hashes
> together on couches side to reflect the fact that all of a design docs views
> are stored in a single index file.
>

Yeah, that's exactly what I'm saying we should stop doing, storing all the
indexes in a single file. We can still batch the generation but invalidating
every view index when one changes really sucks and everybody hates it.


>
> >
> > The second issue I'd like to tackle is two fold, parallelized view
> > generation and unnecessarily chatty IO for large view generations.
> >
> > Currently, every single document is passed to the view server one at a
> time
> > and the response is read back one at a time. I would suggest that we
> allow a
> > user configuration upper limit to "batch" documents to the view server
> (100
> > by default). The request/response would remain exactly the same as it is
> now
> > except there would be an extra array around the request and response.
> >
> > This would also open up the ability for the view server to break up that
> > batch and pass it to different view servers and then return the responses
> > all together (this obviously means it's limited to the speed of the
> client
> > handling that last chunk).
> >
> > Thoughts?
>
> this is good, we should do it. we should maybe spend a little more time in
> the design phase thinking about how to similar fix the lingering bugs in the
> _list code, make externals non-blocking, etc.
>

There is thread from a few months ago on changing the _list protocol so you
can return status codes after you've started to evaluate the rows.

Another problem I hit recently trying to get a view server working in
node.js is that the list functions getRow() call is blocking and actually
blocks on the underlying readline implementation so it's not actually
supportable using non-blocking IO. That kind of sucks but fixing it means a
breaking change to the list functions javascript API.

-Mikeal

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message