couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glynn Bird <glynn.b...@gmail.com>
Subject Re: [DISCUSS] Mango indexes on FDB
Date Thu, 26 Mar 2020 09:41:14 GMT
Agree with Will that falling back to _all_docs-powered queries is usually
undesirable in all but the smallest data sets. More folks than you'd think
end up going into production without the right index because the
_all_docs-powered query in development (with a small data set) seemed to be
fast enough.

I always advise people to use "use_index" so they get the predictability of
"this query uses that index". You're then left with the the user wondering
whether index X is built yet and for that they have to navigate
_active_tasks or poll a query until it returns something, which is a little
primitive but probably beyond the scope of Garren's original post.

On Thu, 26 Mar 2020 at 09:04, Will Holley <willholley@gmail.com> wrote:

> Broadly, I think it's a big step forward if we can prevent Mango from
> automatically selecting extremely stale indexes.
>
> I've been going back and forth on whether step 3 could lead to some
> difficult-to-predict behaviour. If we assume that requests have a short
> timeout - e.g. we can't return any result if it doesn't complete within the
> FDB transaction timeout - then I think it's fine: queries that use
> _all_docs and a large database will be timing out anyway.
>
> If we were to allow long-running queries then it seems a bit sketchier
> because adding an index to a large database could cause queries that
> previously completed to start timing out whilst they block on the index
> build. This is basically how Mango in CouchDB 2/3 behaves and has been a
> big pain point for customers I've worked with, to the point where you
> basically need to explicitly specify which index Mango uses in all cases if
> you're to avoid surprise timeouts when somebody adds a new index.
>
> As I understand it, we're not allowing queries to span FDB transactions so
> this latter case is not something to worry about?
>
> Cheers,
>
> Will
>
> On Wed, 25 Mar 2020 at 19:43, Garren Smith <garren@apache.org> wrote:
>
> > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <paul.joseph.davis@gmail.com>
> > wrote:
> >
> > > > It was therefore felt that having an immediate "Not ready" signal for
> > > just _some_ calls to _find, based on the type of backing index, was a
> bad
> > > and confusing api.
> > > >
> > > > We also discussed _find calls where the user does not specify an
> index,
> > > and concluded that we would be free to choose between using the
> _all_docs
> > > index (which is always up to date but rarely the best index for a given
> > > selector) or blocking to update a better but stale index.
> > > >
> > > > Summary-ing my summarisation;
> > > >
> > > > 1) if you specify an index, we'll use it even if we have to update
> it,
> > > no matter how long that takes.
> > > > 2) if you don't specify an index, it's the dealers choice. The
> details
> > > here may change in point releases.
> > > >
> > >
> > > So it seems there's still a bit of confusion on what the consensus is
> > > here. The way that I had thought this would work is that we'd do
> > > something like such:
> > >
> > > 1. If user specifies and index, use it even if we have to wait
> > > 2. If an index is built that can be used, use it
> > > 3. If an index is building that can be used, wait for it
> > > 4. As a last resort use _all_docs
> > >
> > > Discussing with Garren on the PR he's of the opinion that we should
> > > skip step 3 and just go directly to using _all_docs if nothing is
> > > built.
> > >
> >
> > I just want to clarify step 3. I'm ok with using an index that still
> needs
> > to be built as long as there is no other built index
> > that can service the request.
> >
> > So the big thing for me is to always prefer a built index over a building
> > index. In the situation where there is only 1 building index versus all
> > docs I'm ok with using the building index.
> >
> >
> >
> >
> > > My main assumption is that most cases where a user is creating an
> > > index and then wanting to run a query with it are in the
> > > design/exploration phase of learning the feature or designing an index
> > > to use. In that scenario if we skip waiting it seems likely that a
> > > user could easily be led to believe that an index creation "worked"
> > > for their selector when in reality it was just backed by _all_docs.
> > >
> > > The other reason for preferring to wait for an index to finish
> > > building is that the UI for the normal case of creating indexes is a
> > > bit awkward. Having to run a polling loop around checking the index
> > > status seems suboptimal in most cases.
> > >
> > > Am I missing other cases that would benefit from not waiting and just
> > > using _all_docs?
> > >
> > > Paul
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message