Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
MIME-Version: 1.0
In-Reply-To: <C9052507-0239-40AA-8476-1AECEDD4F745@apache.org>
References: 
 <CA+tVtbAUYtReT8EtQgdKx=Fd_E2saE-pWKNzhCGdT0P=WOwrpw@mail.gmail.com>
	<3956565.315.1452455659661.JavaMail.Joan@RITA>
	<CADa34LBPzx+BwF9oeJH4v2rO7hsB9bV4XeLV9H-sV+ZV3oe6fw@mail.gmail.com>
	<CAPKtrpnMQoqwj5t3wo0nE-hTbBK6mWO9UgA9EycGKAHC4DG9nQ@mail.gmail.com>
	<C9052507-0239-40AA-8476-1AECEDD4F745@apache.org>
Date: Tue, 12 Jan 2016 20:16:35 +0100
Message-ID: 
 <CAJ1bcfGBmOzQA+cds3-NjR4SEHai6ShoXQWigojvjSnWduXg=g@mail.gmail.com>
Subject: Re: [POC] Mango Catch All Selector
From: Robert Kowalski <rok@kowalski.gd>
To: "dev@couchdb.apache.org" <dev@couchdb.apache.org>
Content-Type: text/plain; charset=UTF-8

thank you all for your feedback!

i like the idea of the error message with a new url.

i agree with garren that it should be a separate endpoint. it takes
some complexity off when explaining each endpoint.

maybe: `/_find_slow`?

On Tue, Jan 12, 2016 at 10:36 AM, Jan Lehnardt <jan@apache.org> wrote:
>
>> On 11 Jan 2016, at 19:55, Tony Sun <tony.sun427@gmail.com> wrote:
>>
>> Hi Robert,
>>
>> Building upon what others have stated above, what do you think about
>> the following:
>>
>> 1) Let the user query without creating an index
>> 2) Return an error message with a new url that has
>> "slow/no_index/developer":true appended at the end. The message clearly
>> explains that this query will be slow, and that creating an index will be
>> more efficient. However, he or she can continue. The error message will
>> then have a link to point to our documentation.
>> 3) In Fauxton, there is a checkbox or button that also appends the
>> "slow/no_index/developer":true to the _find url. If the user clicks it,
>> then the same message pops up to notify the user.
>
>
> I like this!
>
>
> Jan
> --
>
>>
>>
>>
>> Tony
>>
>>
>>
>> On Mon, Jan 11, 2016 at 9:45 AM, Eli Stevens (Gmail) <wickedgrey@gmail.com>
>> wrote:
>>
>>> Just wanted to chime in here as a user - I've run into similar
>>> behavior from CouchDB with the reduce-not-reducing-enough heuristic,
>>> where stuff I was working on went smoothly in dev, but stopped once
>>> real load was pushed through it (thankfully for me, that was in
>>> testing, rather than released to customers).
>>>
>>> It's a frustrating experience, and I don't think that a reputation for
>>> "works until you cross a threshold, and then it doesn't, but only in
>>> production" is a good thing to move towards.
>>>
>>> Perhaps something like adding a key to the returned data along the
>>> lines of "_slow_warning": "This query is going to be slow on large
>>> data sets. See http://..." in addition to the ?slow_warning=true query
>>> param (note that I'm calling it "slow_warning" in both places only to
>>> increase discoverability; without the url param, the no-index query
>>> wouldn't work at all). Bikeshed the name as needed.
>>>
>>> I'd like to see a lot more URLs in CouchDB error messages in general,
>>> actually - I would find it very useful when trying to determine what's
>>> going wrong to have a URL right there in the logs that I can get more
>>> information from.
>>>
>>> On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet <wohali@apache.org> wrote:
>>>> Hi Robert,
>>>>
>>>> I've been thinking about this one for the week or so, and I have a
>>>> simple suggestion:
>>>>
>>>> Add the query parameter slow=true to enable this behaviour.
>>>>
>>>> This meets all the original requirements:
>>>>
>>>> 1. It is not default behaviour
>>>> 2. You can grep the log files for the word 'slow' and find evidence
>>>> 3. There is a shorthand, simple way to enable the behaviour
>>>> 4. Any self-respecting developer will try to remove slow=true, find
>>>> a break, and be forced to learn about indexes
>>>> 5. It's a bit cheeky, which I think is kind of fun :D
>>>>
>>>> All the best,
>>>> Joan
>>>>
>>>> ----- Original Message -----
>>>>> From: "William Edney" <bedney@technicalpursuit.com>
>>>>> To: dev@couchdb.apache.org
>>>>> Sent: Friday, January 8, 2016 10:27:29 AM
>>>>> Subject: Re: [POC] Mango Catch All Selector
>>>>>
>>>>> Hi Robert -
>>>>>
>>>>> As a builder of UI, API and library code who has also done developer
>>>>> training on a variety of technologies, one simple fix might be go
>>>>> ahead and
>>>>> not require indexes to be built, but then to put a big NOTE at the
>>>>> beginning of the "Mango Getting Started" guide (I would assume there
>>>>> is
>>>>> such a piece of documentation) that states: "Note that the examples
>>>>> in this
>>>>> document do not require you to build an index, but for performance
>>>>> reasons
>>>>> we HIGHLY RECOMMEND that you do so. *Click here* for more information
>>>>> about
>>>>> how to do that" (or some such verbiage).
>>>>>
>>>>> My 2 cents.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> - Bill
>>>>>
>>>>> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski <rok@kowalski.gd>
>>>>> wrote:
>>>>>
>>>>>> Hi list,
>>>>>>
>>>>>> At the end of the mail I would like to invite the other folks from
>>>>>> the
>>>>>> mailing list that build interfaces for humans (APIs, CLIs or even
>>>>>> UIs)
>>>>>> to chime in again with their opinions. So all people one the ML,
>>>>>> the
>>>>>> mail is not just a response to Paul, feedback is welcome :)
>>>>>>
>>>>>> Hi Paul, I agree with the timeout. It could lead to very unpleasant
>>>>>> errors which are hard to debug and support.
>>>>>>
>>>>>> I added some thoughts to the other points you made:
>>>>>>
>>>>>>> a) know that the slow queries logs exist,
>>>>>>
>>>>>> Hmm... If I take a look at the 1.x logging it was very
>>>>>> straightforward. As a developer you would spin up a CouchDB and you
>>>>>> get all the log messages into your terminal. It was quite handy in
>>>>>> general for all kind of debugging. That the logs are not displayed
>>>>>> directly on stdout/stderr is in my opinion a general 2.x problem.
>>>>>> The
>>>>>> problem does occur with all kinds of log message we produce in
>>>>>> CouchDB
>>>>>> for 2.x and is not specific to the slow-query-logging.
>>>>>>
>>>>>>
>>>>>>> Ie, "You can try queries with testing:true, when you're ready to
>>>>>>> move to
>>>>>> production you can
>>>>>>> POST your selector to _index to create the index which allows you
>>>>>>> to
>>>>>>> remove testing:true".
>>>>>>
>>>>>> I really like the migration path you mentioned here with the API to
>>>>>> create indexes. I am worried to have a too high entry barrier for
>>>>>> absolute newcomers, people that you want to play around before they
>>>>>> are ready to think about indexes, e.g. by putting coupling the
>>>>>> index
>>>>>> topic from the beginning to the querying.
>>>>>>
>>>>>> When I throw too much things to learn on people (which  may not
>>>>>> have
>>>>>> used a database before), most people get discouraged and does not
>>>>>> take
>>>>>> a look. The usual things they feel or say are : "too complicated",
>>>>>> "I
>>>>>> have not enough time", "product XY is easier to use".
>>>>>>
>>>>>> I would argue that newcomers to a database will launch a high
>>>>>> traffic,
>>>>>> multi-gigabyte product with the database from day one. Day one is
>>>>>> the
>>>>>> day where they learn how to query the data and put data into the
>>>>>> database. Even for scenarios where people have a running high
>>>>>> traffic
>>>>>> system, and have used other databases at a medium to large scale I
>>>>>> would expect given they migrate to Couch, that they run both
>>>>>> systems
>>>>>> in parallel for the first time in order to fix the issues that
>>>>>> occur
>>>>>> during a migration.
>>>>>>
>>>>>> I think we we share the same goal (getting beginners started
>>>>>> quickly)
>>>>>> and the cool thing about your suggestion is that everyone gets the
>>>>>> required knowledge to run a production system right from the very
>>>>>> start. My suggestion leaves some parts out, but reduces the
>>>>>> cognitive
>>>>>> load required to get the very first basic results, e.g. in a
>>>>>> university class setting - or junior developers on their "casual
>>>>>> friday 20% time". My big hope is, once those folks build high
>>>>>> traffic
>>>>>> systems, they remember how easy the usage of CouchDB was and that
>>>>>> they
>>>>>> start to learn more about CouchDB in order to run it in a system
>>>>>> with
>>>>>> more than a few thousand documents.
>>>>>>
>>>>>>
>>>>>> For us both I think the "what" is clear, but the "how" is a bit
>>>>>> different. I also think this discussion still makes progress, but I
>>>>>> am
>>>>>> afraid it could stall. I see that we both have very good rudiments
>>>>>> and
>>>>>> I would like to invite the other folks from the mailing list that
>>>>>> build interfaces for humans (APIs, CLIs or even UIs) to chime in
>>>>>> again
>>>>>> with their opinions - of course I'm also looking forward to your
>>>>>> answer :)
>>>>>>
>>>>>> Best,
>>>>>> Robert :)
>>>>>>
>>>>>> On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis
>>>>>> <paul.joseph.davis@gmail.com>
>>>>>> wrote:
>>>>>>>>> - is a timeout solving the root cause or the symptoms? Could it
>>>>>>>>> be a
>>>>>>>>> temporary or additional step as in conjunction with query
>>>>>>>>> optimisation
>>>>>>>>> tooling?
>>>>>>>>
>>>>>>>> It really depends. From my CouchDB admin and user perspective,
>>>>>>>> this
>>>>>>>> doesn't seem so important to me right now. However, I recognize
>>>>>>>> that
>>>>>>>> there are different usage scenarios with different requirents
>>>>>>>> (e.g. the
>>>>>>>> ones at Cloudant).
>>>>>>>
>>>>>>> I don't think there's anything special about Cloudant in this
>>>>>>> discussion. Its just a question of how do we allow new users the
>>>>>>> ability to easily test and learn the selector/query API while
>>>>>>> also
>>>>>>> preventing them from going too far without creating indexes for
>>>>>>> their
>>>>>>> queries. The slow queries messages are fine, but just as any
>>>>>>> other
>>>>>>> database they don't really prompt the developer to make the
>>>>>>> correct
>>>>>>> change. Ie, the developer has to be savvy enough to a) know that
>>>>>>> the
>>>>>>> slow queries logs exist, b) understand that creating an index
>>>>>>> would
>>>>>>> speed things up, and then c) know which index to create based on
>>>>>>> the
>>>>>>> logged query.
>>>>>>>
>>>>>>> In my experience, the group of users that we're concerned about
>>>>>>> in
>>>>>>> this discussion most likely don't know about any of those three
>>>>>>> things, hence why the current API is designed to force them to
>>>>>>> learn
>>>>>>> about and understand indexes as part of learning the API. Granted
>>>>>>> the
>>>>>>> `_id > null` trick muddies that learning process. I would think
>>>>>>> that
>>>>>>> replacing the _id trick with `"testing": true` or similar would
>>>>>>> be an
>>>>>>> obvious indication to users that this is a dev/debug type feature
>>>>>>> and
>>>>>>> when they went to production they would still be pushed to using
>>>>>>> an
>>>>>>> index. If we add the "create index from selector" API then I
>>>>>>> think
>>>>>>> this would be a relatively straightforward method to on ramping
>>>>>>> to
>>>>>>> both the query and index sides of the API. Ie, "You can try
>>>>>>> queries
>>>>>>> with testing:true, when you're ready to move to production you
>>>>>>> can
>>>>>>> POST your selector to _index to create the index which allows you
>>>>>>> to
>>>>>>> remove testing:true".
>>>>>>>
>>>>>>> That's also why I don't particularly care for the timeout
>>>>>>> approach.
>>>>>>> It's a binary threshold that a user would (maybe) meet after some
>>>>>>> unknown amount of time after they falsely believe their app is
>>>>>>> working
>>>>>>> correctly. The feedback is "Everything is fine until it isn't".
>>>>>>> Consider an app that's been working for a week or a month or more
>>>>>>> that
>>>>>>> suddenly starts throwing timeouts for a query. From the user's
>>>>>>> perspective the database broke because the query that used to
>>>>>>> work
>>>>>>> fine no longer does. And then there's the follow on question on
>>>>>>> how
>>>>>>> that timeout might instruct the user that they need an index, and
>>>>>>> that
>>>>>>> the fix may be as easy as POSTing their selector to the _index
>>>>>>> endpoint. Sure Google would most likely have the answer if our
>>>>>>> docs
>>>>>>> are good enough, but by that point the developer is probably
>>>>>>> already
>>>>>>> experiencing downtime if their app is live which means they're
>>>>>>> frantically trying to fix the thing. From my point of view, a few
>>>>>>> road
>>>>>>> blocks that guide developers towards the correct usage early on
>>>>>>> would
>>>>>>> be better than letting them get to the adrenaline fueled
>>>>>>> expletive
>>>>>>> fountain of downtime.
>>>>>>
>>>>>
>>>
>