From dev-return-49210-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Fri Mar 27 17:03:36 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 213A2180637 for ; Fri, 27 Mar 2020 18:03:36 +0100 (CET) Received: (qmail 33327 invoked by uid 500); 27 Mar 2020 17:03:35 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 33315 invoked by uid 99); 27 Mar 2020 17:03:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2020 17:03:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 691111A318E for ; Fri, 27 Mar 2020 17:03:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.201 X-Spam-Level: X-Spam-Status: No, score=-0.201 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 3V5Y7Vau9Edj for ; Fri, 27 Mar 2020 17:03:32 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.167.66; helo=mail-lf1-f66.google.com; envelope-from=paul.joseph.davis@gmail.com; receiver= Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 4E3ABBB818 for ; Fri, 27 Mar 2020 17:03:32 +0000 (UTC) Received: by mail-lf1-f66.google.com with SMTP id j17so8455186lfe.7 for ; Fri, 27 Mar 2020 10:03:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=g8pBLeUbskxouHHG4wDrpdQIXg27T2/VUdufxlC7ubo=; b=GO0RDI/ql34Nt6fqosegZwGG5t5eKEY2RJzIuPyx/zC94tkHM8gW9KXgbzw6umg6RT NLgwWTc66sekExANJmu3GVplhxYYgig+TdXhrI7GoZTREvvpIh1l+YlrMircAw4F2QGG 0hxdzYX4q+PnJaSRSREh5Ebu2WQWenqfXyUTIz7Le+jAHZCu0h34rpt3d/dZIlmphmiz X68Qz7J0AsGNVHq8XGIgi9QTnAoDwzcRrO3PsqDAXchJ4EL3mT/HBRcFpuCZZPRfOkov LjNcNb5YL9tFy/hmOcn6qbm0D07K5+3UtmzbcijFfHQSUnFhnRnlkUVmqH2L4T4sUT5z nTTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=g8pBLeUbskxouHHG4wDrpdQIXg27T2/VUdufxlC7ubo=; b=EWNjrVBmkGNvVt41vAJbm9uBN8Bypqr1f1NyCdS2NdXEKaa4/vPeDurf+GKMsYpl02 Y3hEctLFjIytoU7AHPrxta2PPf3cSB0ddV0JglECDfcL+2c/N61i5c94tJmpKshU9Ygb hKyyi6OVnX7b9MB1YzvXw0qYF0C+Ddcd+nj/gZlZ3h8VRD/1BnHAdISXqAFOOgsYMbUC 8vjyk3tEQkLCkSo5tLqbsWGMDcucwYVYGko2uLmqwqulxP4MFdyJO90jvtkzCilGKTwO ywU79wYHnFtY/5IDGKtAEbCp+hrlXcMQxCT9Ak1aDAlkx3EHvU4YJafBVVqjVLfusidh kL5g== X-Gm-Message-State: AGi0PuZAfyzKJU78RzxXhbMTIwXEFZ8JWHcPO1MGCWc377COK09NSooX K8P3fUHJXvzs2GoL+46MnaXR8BmJM4z+2PYbpfv+uA== X-Google-Smtp-Source: APiQypJnO5Juqg8R1EI7N6Q4MU5OKuCJw8yYnQ57SZvSWkVepLvJ7oZkGBHwBTA911yGXyzW8wQlumIVeNofjPmbNIY= X-Received: by 2002:a19:88d4:: with SMTP id k203mr159938lfd.75.1585328610600; Fri, 27 Mar 2020 10:03:30 -0700 (PDT) MIME-Version: 1.0 References: <4ebfca4c-5537-49d6-8e2c-6ea4a98ca69b@www.fastmail.com> <83da7901-6baa-f153-1bff-d60877eee403@apache.org> <518C26B2-0CD2-4FDA-84D1-EDD346A2DBFA@apache.org> In-Reply-To: From: Paul Davis Date: Fri, 27 Mar 2020 12:02:54 -0500 Message-ID: Subject: Re: [DISCUSS] Mango indexes on FDB To: dev@couchdb.apache.org Content-Type: text/plain; charset="UTF-8" Thanks! For some reason your step 4 was elided in the GMail UI but not when Garren responded and I was confused. On Fri, Mar 27, 2020 at 9:11 AM Glynn Bird wrote: > > > The quoting here is weird. Are you saying to skip _all_docs in your > proposal, Glynn? > > I'm saying eliminate (3) from your list of things. > > 1. If user specifies an index, use it even if we have to wait > 2. If an index is built that can be used, use it > 3. n/a > 4. As a last resort use _all_docs > > > On Thu, 26 Mar 2020 at 16:59, Paul Davis > wrote: > > > On Thu, Mar 26, 2020 at 5:33 AM Will Holley wrote: > > > > > > Ah - in that case I think we should remove step 3, as it leads to a > > > confusing mental model. It's much simpler to explain that Mango will only > > > use fresh indexes and any new indexes will build in the background. > > > > > > > Simpler in some respect. The trade off being that we then have to > > teach users how to know that an index is built and also that they then > > need to be aware that different index types will have different ideas > > of what "built" means. > > > > > On Thu, 26 Mar 2020 at 10:15, Garren Smith wrote: > > > > > > > On Thu, Mar 26, 2020 at 11:04 AM Will Holley > > wrote: > > > > > > > > > Broadly, I think it's a big step forward if we can prevent Mango from > > > > > automatically selecting extremely stale indexes. > > > > > > > > > > I've been going back and forth on whether step 3 could lead to some > > > > > difficult-to-predict behaviour. If we assume that requests have a > > short > > > > > timeout - e.g. we can't return any result if it doesn't complete > > within > > > > the > > > > > FDB transaction timeout - then I think it's fine: queries that use > > > > > _all_docs and a large database will be timing out anyway. > > > > > > > > > > If we were to allow long-running queries then it seems a bit > > sketchier > > > > > because adding an index to a large database could cause queries that > > > > > previously completed to start timing out whilst they block on the > > index > > > > > build. This is basically how Mango in CouchDB 2/3 behaves and has > > been a > > > > > big pain point for customers I've worked with, to the point where you > > > > > basically need to explicitly specify which index Mango uses in all > > cases > > > > if > > > > > you're to avoid surprise timeouts when somebody adds a new index. > > > > > > > > > > As I understand it, we're not allowing queries to span FDB > > transactions > > > > so > > > > > this latter case is not something to worry about? > > > > > > > > > > > > We are going to allow queries to span transactions. This is already > > > > implemented for views and will be for mango > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Will > > > > > > > > > > On Wed, 25 Mar 2020 at 19:43, Garren Smith > > wrote: > > > > > > > > > > > On Wed, Mar 25, 2020 at 8:35 PM Paul Davis < > > > > paul.joseph.davis@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > It was therefore felt that having an immediate "Not ready" > > signal > > > > for > > > > > > > just _some_ calls to _find, based on the type of backing index, > > was a > > > > > bad > > > > > > > and confusing api. > > > > > > > > > > > > > > > > We also discussed _find calls where the user does not specify > > an > > > > > index, > > > > > > > and concluded that we would be free to choose between using the > > > > > _all_docs > > > > > > > index (which is always up to date but rarely the best index for a > > > > given > > > > > > > selector) or blocking to update a better but stale index. > > > > > > > > > > > > > > > > Summary-ing my summarisation; > > > > > > > > > > > > > > > > 1) if you specify an index, we'll use it even if we have to > > update > > > > > it, > > > > > > > no matter how long that takes. > > > > > > > > 2) if you don't specify an index, it's the dealers choice. The > > > > > details > > > > > > > here may change in point releases. > > > > > > > > > > > > > > > > > > > > > > So it seems there's still a bit of confusion on what the > > consensus is > > > > > > > here. The way that I had thought this would work is that we'd do > > > > > > > something like such: > > > > > > > > > > > > > > 1. If user specifies and index, use it even if we have to wait > > > > > > > 2. If an index is built that can be used, use it > > > > > > > 3. If an index is building that can be used, wait for it > > > > > > > 4. As a last resort use _all_docs > > > > > > > > > > > > > > Discussing with Garren on the PR he's of the opinion that we > > should > > > > > > > skip step 3 and just go directly to using _all_docs if nothing is > > > > > > > built. > > > > > > > > > > > > > > > > > > > I just want to clarify step 3. I'm ok with using an index that > > still > > > > > needs > > > > > > to be built as long as there is no other built index > > > > > > that can service the request. > > > > > > > > > > > > So the big thing for me is to always prefer a built index over a > > > > building > > > > > > index. In the situation where there is only 1 building index > > versus all > > > > > > docs I'm ok with using the building index. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My main assumption is that most cases where a user is creating an > > > > > > > index and then wanting to run a query with it are in the > > > > > > > design/exploration phase of learning the feature or designing an > > > > index > > > > > > > to use. In that scenario if we skip waiting it seems likely that > > a > > > > > > > user could easily be led to believe that an index creation > > "worked" > > > > > > > for their selector when in reality it was just backed by > > _all_docs. > > > > > > > > > > > > > > The other reason for preferring to wait for an index to finish > > > > > > > building is that the UI for the normal case of creating indexes > > is a > > > > > > > bit awkward. Having to run a polling loop around checking the > > index > > > > > > > status seems suboptimal in most cases. > > > > > > > > > > > > > > Am I missing other cases that would benefit from not waiting and > > just > > > > > > > using _all_docs? > > > > > > > > > > > > > > Paul > > > > > > > > > > > > > > > > > > > > > > > >