Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E166D18EAA for ; Tue, 12 Jan 2016 19:16:46 +0000 (UTC) Received: (qmail 55980 invoked by uid 500); 12 Jan 2016 19:16:46 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 55913 invoked by uid 500); 12 Jan 2016 19:16:46 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 55901 invoked by uid 99); 12 Jan 2016 19:16:46 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2016 19:16:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C72961A0B9E for ; Tue, 12 Jan 2016 19:16:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=kowalski-gd.20150623.gappssmtp.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id hK-irSCerUdP for ; Tue, 12 Jan 2016 19:16:36 +0000 (UTC) Received: from mail-vk0-f42.google.com (mail-vk0-f42.google.com [209.85.213.42]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 30CE143B43 for ; Tue, 12 Jan 2016 19:16:36 +0000 (UTC) Received: by mail-vk0-f42.google.com with SMTP id i129so110477732vkb.0 for ; Tue, 12 Jan 2016 11:16:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kowalski-gd.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=2HGuFtcI3upPOJVc0MOxtxim4FNWn1epKI1ae2Ctk/8=; b=qZu9Vk/8FyeO0oCybhpbH7NFXREvg5hmDIZ1HRgzl4GBs7G47zRIMMysrGLW+6I8c0 Di9pF8o6T52dY/y4sfJcoxRToSalQm+mlsIl4qgcyWImWPjoDgUurdPWruZC/8iU9xlw ufb91mOk/MeY+Tu8OwHxXE25dfc7bDU2nu42gFJSraoqh4cUrgyAHVnZNyls+JG7xrb3 iFiHwFrULYEt7pB77VUPgQ9sf/sESxFzDxsOgvdcckag6WcjcRkUW21+ElRGf+vuY71K eU3Cf57PILFbuHn5iMOAaTIecBCJWue9vTfRXODanokYNXUOvD8sk3dzu1Cy8QYcVE1C I/1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=2HGuFtcI3upPOJVc0MOxtxim4FNWn1epKI1ae2Ctk/8=; b=iBmFuE1hj7zAD8tNS2N3c9C4nA8CQUc/tTXFI/TKE3LRBQc51neBPFhj+479LMgYGp nqyfwp5UOJEZ/jhN643B9tGMnFSiIHds1vw9DDu5cAOIHhP6YqqQ5kRd1K6hMqH6b77U ALyMf9FZuNPJF9S46EQ2RrIwLi6TeBCh1TVvBgy5YPZXaP8hHXqVi7Hrp36Iyc1V0p7J R9ZVaU//H0LAmPulxaFmljXFTkoYWv1FKgzZAZSeHXwJyi0+TpRtl3zAw6DwrFd1UFBt 3D1hlKsuQ8x98zG6eV2rFUUJ2H0mK+moo8rV7b89qoxtjnv94u64xC3Pinu/KAt/PzqU yLdg== X-Gm-Message-State: ALoCoQmwjjJfLwGb2CwBbj4Wvno5DQua4w3+CVxPXvMaMbMJR16VYjoP3BlWgUeXav4ELllKywanUjd7Rt1pnVd/m9RQJ7SOcA== MIME-Version: 1.0 X-Received: by 10.31.16.140 with SMTP id 12mr87161219vkq.106.1452626195674; Tue, 12 Jan 2016 11:16:35 -0800 (PST) Received: by 10.31.170.141 with HTTP; Tue, 12 Jan 2016 11:16:35 -0800 (PST) X-Originating-IP: [2a02:2028:53e:4001:5869:1679:bb79:76ac] In-Reply-To: References: <3956565.315.1452455659661.JavaMail.Joan@RITA> Date: Tue, 12 Jan 2016 20:16:35 +0100 Message-ID: Subject: Re: [POC] Mango Catch All Selector From: Robert Kowalski To: "dev@couchdb.apache.org" Content-Type: text/plain; charset=UTF-8 thank you all for your feedback! i like the idea of the error message with a new url. i agree with garren that it should be a separate endpoint. it takes some complexity off when explaining each endpoint. maybe: `/_find_slow`? On Tue, Jan 12, 2016 at 10:36 AM, Jan Lehnardt wrote: > >> On 11 Jan 2016, at 19:55, Tony Sun wrote: >> >> Hi Robert, >> >> Building upon what others have stated above, what do you think about >> the following: >> >> 1) Let the user query without creating an index >> 2) Return an error message with a new url that has >> "slow/no_index/developer":true appended at the end. The message clearly >> explains that this query will be slow, and that creating an index will be >> more efficient. However, he or she can continue. The error message will >> then have a link to point to our documentation. >> 3) In Fauxton, there is a checkbox or button that also appends the >> "slow/no_index/developer":true to the _find url. If the user clicks it, >> then the same message pops up to notify the user. > > > I like this! > > > Jan > -- > >> >> >> >> Tony >> >> >> >> On Mon, Jan 11, 2016 at 9:45 AM, Eli Stevens (Gmail) >> wrote: >> >>> Just wanted to chime in here as a user - I've run into similar >>> behavior from CouchDB with the reduce-not-reducing-enough heuristic, >>> where stuff I was working on went smoothly in dev, but stopped once >>> real load was pushed through it (thankfully for me, that was in >>> testing, rather than released to customers). >>> >>> It's a frustrating experience, and I don't think that a reputation for >>> "works until you cross a threshold, and then it doesn't, but only in >>> production" is a good thing to move towards. >>> >>> Perhaps something like adding a key to the returned data along the >>> lines of "_slow_warning": "This query is going to be slow on large >>> data sets. See http://..." in addition to the ?slow_warning=true query >>> param (note that I'm calling it "slow_warning" in both places only to >>> increase discoverability; without the url param, the no-index query >>> wouldn't work at all). Bikeshed the name as needed. >>> >>> I'd like to see a lot more URLs in CouchDB error messages in general, >>> actually - I would find it very useful when trying to determine what's >>> going wrong to have a URL right there in the logs that I can get more >>> information from. >>> >>> On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet wrote: >>>> Hi Robert, >>>> >>>> I've been thinking about this one for the week or so, and I have a >>>> simple suggestion: >>>> >>>> Add the query parameter slow=true to enable this behaviour. >>>> >>>> This meets all the original requirements: >>>> >>>> 1. It is not default behaviour >>>> 2. You can grep the log files for the word 'slow' and find evidence >>>> 3. There is a shorthand, simple way to enable the behaviour >>>> 4. Any self-respecting developer will try to remove slow=true, find >>>> a break, and be forced to learn about indexes >>>> 5. It's a bit cheeky, which I think is kind of fun :D >>>> >>>> All the best, >>>> Joan >>>> >>>> ----- Original Message ----- >>>>> From: "William Edney" >>>>> To: dev@couchdb.apache.org >>>>> Sent: Friday, January 8, 2016 10:27:29 AM >>>>> Subject: Re: [POC] Mango Catch All Selector >>>>> >>>>> Hi Robert - >>>>> >>>>> As a builder of UI, API and library code who has also done developer >>>>> training on a variety of technologies, one simple fix might be go >>>>> ahead and >>>>> not require indexes to be built, but then to put a big NOTE at the >>>>> beginning of the "Mango Getting Started" guide (I would assume there >>>>> is >>>>> such a piece of documentation) that states: "Note that the examples >>>>> in this >>>>> document do not require you to build an index, but for performance >>>>> reasons >>>>> we HIGHLY RECOMMEND that you do so. *Click here* for more information >>>>> about >>>>> how to do that" (or some such verbiage). >>>>> >>>>> My 2 cents. >>>>> >>>>> Cheers, >>>>> >>>>> - Bill >>>>> >>>>> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski >>>>> wrote: >>>>> >>>>>> Hi list, >>>>>> >>>>>> At the end of the mail I would like to invite the other folks from >>>>>> the >>>>>> mailing list that build interfaces for humans (APIs, CLIs or even >>>>>> UIs) >>>>>> to chime in again with their opinions. So all people one the ML, >>>>>> the >>>>>> mail is not just a response to Paul, feedback is welcome :) >>>>>> >>>>>> Hi Paul, I agree with the timeout. It could lead to very unpleasant >>>>>> errors which are hard to debug and support. >>>>>> >>>>>> I added some thoughts to the other points you made: >>>>>> >>>>>>> a) know that the slow queries logs exist, >>>>>> >>>>>> Hmm... If I take a look at the 1.x logging it was very >>>>>> straightforward. As a developer you would spin up a CouchDB and you >>>>>> get all the log messages into your terminal. It was quite handy in >>>>>> general for all kind of debugging. That the logs are not displayed >>>>>> directly on stdout/stderr is in my opinion a general 2.x problem. >>>>>> The >>>>>> problem does occur with all kinds of log message we produce in >>>>>> CouchDB >>>>>> for 2.x and is not specific to the slow-query-logging. >>>>>> >>>>>> >>>>>>> Ie, "You can try queries with testing:true, when you're ready to >>>>>>> move to >>>>>> production you can >>>>>>> POST your selector to _index to create the index which allows you >>>>>>> to >>>>>>> remove testing:true". >>>>>> >>>>>> I really like the migration path you mentioned here with the API to >>>>>> create indexes. I am worried to have a too high entry barrier for >>>>>> absolute newcomers, people that you want to play around before they >>>>>> are ready to think about indexes, e.g. by putting coupling the >>>>>> index >>>>>> topic from the beginning to the querying. >>>>>> >>>>>> When I throw too much things to learn on people (which may not >>>>>> have >>>>>> used a database before), most people get discouraged and does not >>>>>> take >>>>>> a look. The usual things they feel or say are : "too complicated", >>>>>> "I >>>>>> have not enough time", "product XY is easier to use". >>>>>> >>>>>> I would argue that newcomers to a database will launch a high >>>>>> traffic, >>>>>> multi-gigabyte product with the database from day one. Day one is >>>>>> the >>>>>> day where they learn how to query the data and put data into the >>>>>> database. Even for scenarios where people have a running high >>>>>> traffic >>>>>> system, and have used other databases at a medium to large scale I >>>>>> would expect given they migrate to Couch, that they run both >>>>>> systems >>>>>> in parallel for the first time in order to fix the issues that >>>>>> occur >>>>>> during a migration. >>>>>> >>>>>> I think we we share the same goal (getting beginners started >>>>>> quickly) >>>>>> and the cool thing about your suggestion is that everyone gets the >>>>>> required knowledge to run a production system right from the very >>>>>> start. My suggestion leaves some parts out, but reduces the >>>>>> cognitive >>>>>> load required to get the very first basic results, e.g. in a >>>>>> university class setting - or junior developers on their "casual >>>>>> friday 20% time". My big hope is, once those folks build high >>>>>> traffic >>>>>> systems, they remember how easy the usage of CouchDB was and that >>>>>> they >>>>>> start to learn more about CouchDB in order to run it in a system >>>>>> with >>>>>> more than a few thousand documents. >>>>>> >>>>>> >>>>>> For us both I think the "what" is clear, but the "how" is a bit >>>>>> different. I also think this discussion still makes progress, but I >>>>>> am >>>>>> afraid it could stall. I see that we both have very good rudiments >>>>>> and >>>>>> I would like to invite the other folks from the mailing list that >>>>>> build interfaces for humans (APIs, CLIs or even UIs) to chime in >>>>>> again >>>>>> with their opinions - of course I'm also looking forward to your >>>>>> answer :) >>>>>> >>>>>> Best, >>>>>> Robert :) >>>>>> >>>>>> On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis >>>>>> >>>>>> wrote: >>>>>>>>> - is a timeout solving the root cause or the symptoms? Could it >>>>>>>>> be a >>>>>>>>> temporary or additional step as in conjunction with query >>>>>>>>> optimisation >>>>>>>>> tooling? >>>>>>>> >>>>>>>> It really depends. From my CouchDB admin and user perspective, >>>>>>>> this >>>>>>>> doesn't seem so important to me right now. However, I recognize >>>>>>>> that >>>>>>>> there are different usage scenarios with different requirents >>>>>>>> (e.g. the >>>>>>>> ones at Cloudant). >>>>>>> >>>>>>> I don't think there's anything special about Cloudant in this >>>>>>> discussion. Its just a question of how do we allow new users the >>>>>>> ability to easily test and learn the selector/query API while >>>>>>> also >>>>>>> preventing them from going too far without creating indexes for >>>>>>> their >>>>>>> queries. The slow queries messages are fine, but just as any >>>>>>> other >>>>>>> database they don't really prompt the developer to make the >>>>>>> correct >>>>>>> change. Ie, the developer has to be savvy enough to a) know that >>>>>>> the >>>>>>> slow queries logs exist, b) understand that creating an index >>>>>>> would >>>>>>> speed things up, and then c) know which index to create based on >>>>>>> the >>>>>>> logged query. >>>>>>> >>>>>>> In my experience, the group of users that we're concerned about >>>>>>> in >>>>>>> this discussion most likely don't know about any of those three >>>>>>> things, hence why the current API is designed to force them to >>>>>>> learn >>>>>>> about and understand indexes as part of learning the API. Granted >>>>>>> the >>>>>>> `_id > null` trick muddies that learning process. I would think >>>>>>> that >>>>>>> replacing the _id trick with `"testing": true` or similar would >>>>>>> be an >>>>>>> obvious indication to users that this is a dev/debug type feature >>>>>>> and >>>>>>> when they went to production they would still be pushed to using >>>>>>> an >>>>>>> index. If we add the "create index from selector" API then I >>>>>>> think >>>>>>> this would be a relatively straightforward method to on ramping >>>>>>> to >>>>>>> both the query and index sides of the API. Ie, "You can try >>>>>>> queries >>>>>>> with testing:true, when you're ready to move to production you >>>>>>> can >>>>>>> POST your selector to _index to create the index which allows you >>>>>>> to >>>>>>> remove testing:true". >>>>>>> >>>>>>> That's also why I don't particularly care for the timeout >>>>>>> approach. >>>>>>> It's a binary threshold that a user would (maybe) meet after some >>>>>>> unknown amount of time after they falsely believe their app is >>>>>>> working >>>>>>> correctly. The feedback is "Everything is fine until it isn't". >>>>>>> Consider an app that's been working for a week or a month or more >>>>>>> that >>>>>>> suddenly starts throwing timeouts for a query. From the user's >>>>>>> perspective the database broke because the query that used to >>>>>>> work >>>>>>> fine no longer does. And then there's the follow on question on >>>>>>> how >>>>>>> that timeout might instruct the user that they need an index, and >>>>>>> that >>>>>>> the fix may be as easy as POSTing their selector to the _index >>>>>>> endpoint. Sure Google would most likely have the answer if our >>>>>>> docs >>>>>>> are good enough, but by that point the developer is probably >>>>>>> already >>>>>>> experiencing downtime if their app is live which means they're >>>>>>> frantically trying to fix the thing. From my point of view, a few >>>>>>> road >>>>>>> blocks that guide developers towards the correct usage early on >>>>>>> would >>>>>>> be better than letting them get to the adrenaline fueled >>>>>>> expletive >>>>>>> fountain of downtime. >>>>>> >>>>> >>> >