couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Holley <willhol...@gmail.com>
Subject Re: [DISCUSS] Mango indexes on FDB
Date Thu, 26 Mar 2020 09:04:13 GMT
Broadly, I think it's a big step forward if we can prevent Mango from
automatically selecting extremely stale indexes.

I've been going back and forth on whether step 3 could lead to some
difficult-to-predict behaviour. If we assume that requests have a short
timeout - e.g. we can't return any result if it doesn't complete within the
FDB transaction timeout - then I think it's fine: queries that use
_all_docs and a large database will be timing out anyway.

If we were to allow long-running queries then it seems a bit sketchier
because adding an index to a large database could cause queries that
previously completed to start timing out whilst they block on the index
build. This is basically how Mango in CouchDB 2/3 behaves and has been a
big pain point for customers I've worked with, to the point where you
basically need to explicitly specify which index Mango uses in all cases if
you're to avoid surprise timeouts when somebody adds a new index.

As I understand it, we're not allowing queries to span FDB transactions so
this latter case is not something to worry about?

Cheers,

Will

On Wed, 25 Mar 2020 at 19:43, Garren Smith <garren@apache.org> wrote:

> On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <paul.joseph.davis@gmail.com>
> wrote:
>
> > > It was therefore felt that having an immediate "Not ready" signal for
> > just _some_ calls to _find, based on the type of backing index, was a bad
> > and confusing api.
> > >
> > > We also discussed _find calls where the user does not specify an index,
> > and concluded that we would be free to choose between using the _all_docs
> > index (which is always up to date but rarely the best index for a given
> > selector) or blocking to update a better but stale index.
> > >
> > > Summary-ing my summarisation;
> > >
> > > 1) if you specify an index, we'll use it even if we have to update it,
> > no matter how long that takes.
> > > 2) if you don't specify an index, it's the dealers choice. The details
> > here may change in point releases.
> > >
> >
> > So it seems there's still a bit of confusion on what the consensus is
> > here. The way that I had thought this would work is that we'd do
> > something like such:
> >
> > 1. If user specifies and index, use it even if we have to wait
> > 2. If an index is built that can be used, use it
> > 3. If an index is building that can be used, wait for it
> > 4. As a last resort use _all_docs
> >
> > Discussing with Garren on the PR he's of the opinion that we should
> > skip step 3 and just go directly to using _all_docs if nothing is
> > built.
> >
>
> I just want to clarify step 3. I'm ok with using an index that still needs
> to be built as long as there is no other built index
> that can service the request.
>
> So the big thing for me is to always prefer a built index over a building
> index. In the situation where there is only 1 building index versus all
> docs I'm ok with using the building index.
>
>
>
>
> > My main assumption is that most cases where a user is creating an
> > index and then wanting to run a query with it are in the
> > design/exploration phase of learning the feature or designing an index
> > to use. In that scenario if we skip waiting it seems likely that a
> > user could easily be led to believe that an index creation "worked"
> > for their selector when in reality it was just backed by _all_docs.
> >
> > The other reason for preferring to wait for an index to finish
> > building is that the UI for the normal case of creating indexes is a
> > bit awkward. Having to run a polling loop around checking the index
> > status seems suboptimal in most cases.
> >
> > Am I missing other cases that would benefit from not waiting and just
> > using _all_docs?
> >
> > Paul
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message