couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: [DISCUSS] Mango indexes on FDB
Date Fri, 27 Mar 2020 17:02:54 GMT
Thanks! For some reason your step 4 was elided in the GMail UI but not
when Garren responded and I was confused.

On Fri, Mar 27, 2020 at 9:11 AM Glynn Bird <glynn.bird@gmail.com> wrote:
>
> > The quoting here is weird. Are you saying to skip _all_docs in your
> proposal, Glynn?
>
> I'm saying eliminate (3) from your list of things.
>
> 1. If user specifies an index, use it even if we have to wait
> 2. If an index is built that can be used, use it
> 3. n/a
> 4. As a last resort use _all_docs
>
>
> On Thu, 26 Mar 2020 at 16:59, Paul Davis <paul.joseph.davis@gmail.com>
> wrote:
>
> > On Thu, Mar 26, 2020 at 5:33 AM Will Holley <willholley@gmail.com> wrote:
> > >
> > > Ah - in that case I think we should remove step 3, as it leads to a
> > > confusing mental model. It's much simpler to explain that Mango will only
> > > use fresh indexes and any new indexes will build in the background.
> > >
> >
> > Simpler in some respect. The trade off being that we then have to
> > teach users how to know that an index is built and also that they then
> > need to be aware that different index types will have different ideas
> > of what "built" means.
> >
> > > On Thu, 26 Mar 2020 at 10:15, Garren Smith <garren@apache.org> wrote:
> > >
> > > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley <willholley@gmail.com>
> > wrote:
> > > >
> > > > > Broadly, I think it's a big step forward if we can prevent Mango
from
> > > > > automatically selecting extremely stale indexes.
> > > > >
> > > > > I've been going back and forth on whether step 3 could lead to some
> > > > > difficult-to-predict behaviour. If we assume that requests have a
> > short
> > > > > timeout - e.g. we can't return any result if it doesn't complete
> > within
> > > > the
> > > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > > _all_docs and a large database will be timing out anyway.
> > > > >
> > > > > If we were to allow long-running queries then it seems a bit
> > sketchier
> > > > > because adding an index to a large database could cause queries that
> > > > > previously completed to start timing out whilst they block on the
> > index
> > > > > build. This is basically how Mango in CouchDB 2/3 behaves and has
> > been a
> > > > > big pain point for customers I've worked with, to the point where
you
> > > > > basically need to explicitly specify which index Mango uses in all
> > cases
> > > > if
> > > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > > >
> > > > > As I understand it, we're not allowing queries to span FDB
> > transactions
> > > > so
> > > > > this latter case is not something to worry about?
> > > >
> > > >
> > > > We are going to allow queries to span transactions. This is already
> > > > implemented for views and will be for mango
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Will
> > > > >
> > > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith <garren@apache.org>
> > wrote:
> > > > >
> > > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > > paul.joseph.davis@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > > It was therefore felt that having an immediate "Not
ready"
> > signal
> > > > for
> > > > > > > just _some_ calls to _find, based on the type of backing
index,
> > was a
> > > > > bad
> > > > > > > and confusing api.
> > > > > > > >
> > > > > > > > We also discussed _find calls where the user does
not specify
> > an
> > > > > index,
> > > > > > > and concluded that we would be free to choose between using
the
> > > > > _all_docs
> > > > > > > index (which is always up to date but rarely the best index
for a
> > > > given
> > > > > > > selector) or blocking to update a better but stale index.
> > > > > > > >
> > > > > > > > Summary-ing my summarisation;
> > > > > > > >
> > > > > > > > 1) if you specify an index, we'll use it even if we
have to
> > update
> > > > > it,
> > > > > > > no matter how long that takes.
> > > > > > > > 2) if you don't specify an index, it's the dealers
choice. The
> > > > > details
> > > > > > > here may change in point releases.
> > > > > > > >
> > > > > > >
> > > > > > > So it seems there's still a bit of confusion on what the
> > consensus is
> > > > > > > here. The way that I had thought this would work is that
we'd do
> > > > > > > something like such:
> > > > > > >
> > > > > > > 1. If user specifies and index, use it even if we have
to wait
> > > > > > > 2. If an index is built that can be used, use it
> > > > > > > 3. If an index is building that can be used, wait for it
> > > > > > > 4. As a last resort use _all_docs
> > > > > > >
> > > > > > > Discussing with Garren on the PR he's of the opinion that
we
> > should
> > > > > > > skip step 3 and just go directly to using _all_docs if
nothing is
> > > > > > > built.
> > > > > > >
> > > > > >
> > > > > > I just want to clarify step 3. I'm ok with using an index that
> > still
> > > > > needs
> > > > > > to be built as long as there is no other built index
> > > > > > that can service the request.
> > > > > >
> > > > > > So the big thing for me is to always prefer a built index over
a
> > > > building
> > > > > > index. In the situation where there is only 1 building index
> > versus all
> > > > > > docs I'm ok with using the building index.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > My main assumption is that most cases where a user is creating
an
> > > > > > > index and then wanting to run a query with it are in the
> > > > > > > design/exploration phase of learning the feature or designing
an
> > > > index
> > > > > > > to use. In that scenario if we skip waiting it seems likely
that
> > a
> > > > > > > user could easily be led to believe that an index creation
> > "worked"
> > > > > > > for their selector when in reality it was just backed by
> > _all_docs.
> > > > > > >
> > > > > > > The other reason for preferring to wait for an index to
finish
> > > > > > > building is that the UI for the normal case of creating
indexes
> > is a
> > > > > > > bit awkward. Having to run a polling loop around checking
the
> > index
> > > > > > > status seems suboptimal in most cases.
> > > > > > >
> > > > > > > Am I missing other cases that would benefit from not waiting
and
> > just
> > > > > > > using _all_docs?
> > > > > > >
> > > > > > > Paul
> > > > > > >
> > > > > >
> > > > >
> > > >
> >

Mime
View raw message