couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glynn Bird <glynn.b...@gmail.com>
Subject Re: [DISCUSS] Mango indexes on FDB
Date Fri, 27 Mar 2020 14:10:43 GMT
> The quoting here is weird. Are you saying to skip _all_docs in your
proposal, Glynn?

I'm saying eliminate (3) from your list of things.

1. If user specifies an index, use it even if we have to wait
2. If an index is built that can be used, use it
3. n/a
4. As a last resort use _all_docs


On Thu, 26 Mar 2020 at 16:59, Paul Davis <paul.joseph.davis@gmail.com>
wrote:

> On Thu, Mar 26, 2020 at 5:33 AM Will Holley <willholley@gmail.com> wrote:
> >
> > Ah - in that case I think we should remove step 3, as it leads to a
> > confusing mental model. It's much simpler to explain that Mango will only
> > use fresh indexes and any new indexes will build in the background.
> >
>
> Simpler in some respect. The trade off being that we then have to
> teach users how to know that an index is built and also that they then
> need to be aware that different index types will have different ideas
> of what "built" means.
>
> > On Thu, 26 Mar 2020 at 10:15, Garren Smith <garren@apache.org> wrote:
> >
> > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley <willholley@gmail.com>
> wrote:
> > >
> > > > Broadly, I think it's a big step forward if we can prevent Mango from
> > > > automatically selecting extremely stale indexes.
> > > >
> > > > I've been going back and forth on whether step 3 could lead to some
> > > > difficult-to-predict behaviour. If we assume that requests have a
> short
> > > > timeout - e.g. we can't return any result if it doesn't complete
> within
> > > the
> > > > FDB transaction timeout - then I think it's fine: queries that use
> > > > _all_docs and a large database will be timing out anyway.
> > > >
> > > > If we were to allow long-running queries then it seems a bit
> sketchier
> > > > because adding an index to a large database could cause queries that
> > > > previously completed to start timing out whilst they block on the
> index
> > > > build. This is basically how Mango in CouchDB 2/3 behaves and has
> been a
> > > > big pain point for customers I've worked with, to the point where you
> > > > basically need to explicitly specify which index Mango uses in all
> cases
> > > if
> > > > you're to avoid surprise timeouts when somebody adds a new index.
> > > >
> > > > As I understand it, we're not allowing queries to span FDB
> transactions
> > > so
> > > > this latter case is not something to worry about?
> > >
> > >
> > > We are going to allow queries to span transactions. This is already
> > > implemented for views and will be for mango
> > >
> > >
> > > >
> > > > Cheers,
> > > >
> > > > Will
> > > >
> > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith <garren@apache.org>
> wrote:
> > > >
> > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis <
> > > paul.joseph.davis@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > > It was therefore felt that having an immediate "Not ready"
> signal
> > > for
> > > > > > just _some_ calls to _find, based on the type of backing index,
> was a
> > > > bad
> > > > > > and confusing api.
> > > > > > >
> > > > > > > We also discussed _find calls where the user does not specify
> an
> > > > index,
> > > > > > and concluded that we would be free to choose between using
the
> > > > _all_docs
> > > > > > index (which is always up to date but rarely the best index
for a
> > > given
> > > > > > selector) or blocking to update a better but stale index.
> > > > > > >
> > > > > > > Summary-ing my summarisation;
> > > > > > >
> > > > > > > 1) if you specify an index, we'll use it even if we have
to
> update
> > > > it,
> > > > > > no matter how long that takes.
> > > > > > > 2) if you don't specify an index, it's the dealers choice.
The
> > > > details
> > > > > > here may change in point releases.
> > > > > > >
> > > > > >
> > > > > > So it seems there's still a bit of confusion on what the
> consensus is
> > > > > > here. The way that I had thought this would work is that we'd
do
> > > > > > something like such:
> > > > > >
> > > > > > 1. If user specifies and index, use it even if we have to wait
> > > > > > 2. If an index is built that can be used, use it
> > > > > > 3. If an index is building that can be used, wait for it
> > > > > > 4. As a last resort use _all_docs
> > > > > >
> > > > > > Discussing with Garren on the PR he's of the opinion that we
> should
> > > > > > skip step 3 and just go directly to using _all_docs if nothing
is
> > > > > > built.
> > > > > >
> > > > >
> > > > > I just want to clarify step 3. I'm ok with using an index that
> still
> > > > needs
> > > > > to be built as long as there is no other built index
> > > > > that can service the request.
> > > > >
> > > > > So the big thing for me is to always prefer a built index over a
> > > building
> > > > > index. In the situation where there is only 1 building index
> versus all
> > > > > docs I'm ok with using the building index.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > My main assumption is that most cases where a user is creating
an
> > > > > > index and then wanting to run a query with it are in the
> > > > > > design/exploration phase of learning the feature or designing
an
> > > index
> > > > > > to use. In that scenario if we skip waiting it seems likely
that
> a
> > > > > > user could easily be led to believe that an index creation
> "worked"
> > > > > > for their selector when in reality it was just backed by
> _all_docs.
> > > > > >
> > > > > > The other reason for preferring to wait for an index to finish
> > > > > > building is that the UI for the normal case of creating indexes
> is a
> > > > > > bit awkward. Having to run a polling loop around checking the
> index
> > > > > > status seems suboptimal in most cases.
> > > > > >
> > > > > > Am I missing other cases that would benefit from not waiting
and
> just
> > > > > > using _all_docs?
> > > > > >
> > > > > > Paul
> > > > > >
> > > > >
> > > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message