couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garren Smith <gar...@apache.org>
Subject Re: [POC] Mango Catch All Selector
Date Wed, 13 Jan 2016 05:03:51 GMT
Hi Robert,

I think you miss understood me, I don’t want it to be a different endpoint. 
I just don’t want a user to have to do queries like this find({slow: true}). I want them
to be able to do a query e.g. find({}) or find({selector: null}) and then get back the results
along with a warning message telling them that this query would be slow in production.
The lower the barrier for entry here the better. I know we want to protect our users for when
they go to production, but forcing them to add a slow: true flag won’t help. It will still
require them to read the docs a lot more than most people are willing to on a first attempt
of something new.

Cheers
Garren
> On 12 Jan 2016, at 9:16 PM, Robert Kowalski <rok@kowalski.gd> wrote:
> 
> thank you all for your feedback!
> 
> i like the idea of the error message with a new url.
> 
> i agree with garren that it should be a separate endpoint. it takes
> some complexity off when explaining each endpoint.
> 
> maybe: `/_find_slow`?
> 
> On Tue, Jan 12, 2016 at 10:36 AM, Jan Lehnardt <jan@apache.org> wrote:
>> 
>>> On 11 Jan 2016, at 19:55, Tony Sun <tony.sun427@gmail.com> wrote:
>>> 
>>> Hi Robert,
>>> 
>>> Building upon what others have stated above, what do you think about
>>> the following:
>>> 
>>> 1) Let the user query without creating an index
>>> 2) Return an error message with a new url that has
>>> "slow/no_index/developer":true appended at the end. The message clearly
>>> explains that this query will be slow, and that creating an index will be
>>> more efficient. However, he or she can continue. The error message will
>>> then have a link to point to our documentation.
>>> 3) In Fauxton, there is a checkbox or button that also appends the
>>> "slow/no_index/developer":true to the _find url. If the user clicks it,
>>> then the same message pops up to notify the user.
>> 
>> 
>> I like this!
>> 
>> 
>> Jan
>> --
>> 
>>> 
>>> 
>>> 
>>> Tony
>>> 
>>> 
>>> 
>>> On Mon, Jan 11, 2016 at 9:45 AM, Eli Stevens (Gmail) <wickedgrey@gmail.com>
>>> wrote:
>>> 
>>>> Just wanted to chime in here as a user - I've run into similar
>>>> behavior from CouchDB with the reduce-not-reducing-enough heuristic,
>>>> where stuff I was working on went smoothly in dev, but stopped once
>>>> real load was pushed through it (thankfully for me, that was in
>>>> testing, rather than released to customers).
>>>> 
>>>> It's a frustrating experience, and I don't think that a reputation for
>>>> "works until you cross a threshold, and then it doesn't, but only in
>>>> production" is a good thing to move towards.
>>>> 
>>>> Perhaps something like adding a key to the returned data along the
>>>> lines of "_slow_warning": "This query is going to be slow on large
>>>> data sets. See http://..." in addition to the ?slow_warning=true query
>>>> param (note that I'm calling it "slow_warning" in both places only to
>>>> increase discoverability; without the url param, the no-index query
>>>> wouldn't work at all). Bikeshed the name as needed.
>>>> 
>>>> I'd like to see a lot more URLs in CouchDB error messages in general,
>>>> actually - I would find it very useful when trying to determine what's
>>>> going wrong to have a URL right there in the logs that I can get more
>>>> information from.
>>>> 
>>>> On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet <wohali@apache.org> wrote:
>>>>> Hi Robert,
>>>>> 
>>>>> I've been thinking about this one for the week or so, and I have a
>>>>> simple suggestion:
>>>>> 
>>>>> Add the query parameter slow=true to enable this behaviour.
>>>>> 
>>>>> This meets all the original requirements:
>>>>> 
>>>>> 1. It is not default behaviour
>>>>> 2. You can grep the log files for the word 'slow' and find evidence
>>>>> 3. There is a shorthand, simple way to enable the behaviour
>>>>> 4. Any self-respecting developer will try to remove slow=true, find
>>>>> a break, and be forced to learn about indexes
>>>>> 5. It's a bit cheeky, which I think is kind of fun :D
>>>>> 
>>>>> All the best,
>>>>> Joan
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "William Edney" <bedney@technicalpursuit.com>
>>>>>> To: dev@couchdb.apache.org
>>>>>> Sent: Friday, January 8, 2016 10:27:29 AM
>>>>>> Subject: Re: [POC] Mango Catch All Selector
>>>>>> 
>>>>>> Hi Robert -
>>>>>> 
>>>>>> As a builder of UI, API and library code who has also done developer
>>>>>> training on a variety of technologies, one simple fix might be go
>>>>>> ahead and
>>>>>> not require indexes to be built, but then to put a big NOTE at the
>>>>>> beginning of the "Mango Getting Started" guide (I would assume there
>>>>>> is
>>>>>> such a piece of documentation) that states: "Note that the examples
>>>>>> in this
>>>>>> document do not require you to build an index, but for performance
>>>>>> reasons
>>>>>> we HIGHLY RECOMMEND that you do so. *Click here* for more information
>>>>>> about
>>>>>> how to do that" (or some such verbiage).
>>>>>> 
>>>>>> My 2 cents.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> - Bill
>>>>>> 
>>>>>> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski <rok@kowalski.gd>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi list,
>>>>>>> 
>>>>>>> At the end of the mail I would like to invite the other folks
from
>>>>>>> the
>>>>>>> mailing list that build interfaces for humans (APIs, CLIs or
even
>>>>>>> UIs)
>>>>>>> to chime in again with their opinions. So all people one the
ML,
>>>>>>> the
>>>>>>> mail is not just a response to Paul, feedback is welcome :)
>>>>>>> 
>>>>>>> Hi Paul, I agree with the timeout. It could lead to very unpleasant
>>>>>>> errors which are hard to debug and support.
>>>>>>> 
>>>>>>> I added some thoughts to the other points you made:
>>>>>>> 
>>>>>>>> a) know that the slow queries logs exist,
>>>>>>> 
>>>>>>> Hmm... If I take a look at the 1.x logging it was very
>>>>>>> straightforward. As a developer you would spin up a CouchDB and
you
>>>>>>> get all the log messages into your terminal. It was quite handy
in
>>>>>>> general for all kind of debugging. That the logs are not displayed
>>>>>>> directly on stdout/stderr is in my opinion a general 2.x problem.
>>>>>>> The
>>>>>>> problem does occur with all kinds of log message we produce in
>>>>>>> CouchDB
>>>>>>> for 2.x and is not specific to the slow-query-logging.
>>>>>>> 
>>>>>>> 
>>>>>>>> Ie, "You can try queries with testing:true, when you're ready
to
>>>>>>>> move to
>>>>>>> production you can
>>>>>>>> POST your selector to _index to create the index which allows
you
>>>>>>>> to
>>>>>>>> remove testing:true".
>>>>>>> 
>>>>>>> I really like the migration path you mentioned here with the
API to
>>>>>>> create indexes. I am worried to have a too high entry barrier
for
>>>>>>> absolute newcomers, people that you want to play around before
they
>>>>>>> are ready to think about indexes, e.g. by putting coupling the
>>>>>>> index
>>>>>>> topic from the beginning to the querying.
>>>>>>> 
>>>>>>> When I throw too much things to learn on people (which  may not
>>>>>>> have
>>>>>>> used a database before), most people get discouraged and does
not
>>>>>>> take
>>>>>>> a look. The usual things they feel or say are : "too complicated",
>>>>>>> "I
>>>>>>> have not enough time", "product XY is easier to use".
>>>>>>> 
>>>>>>> I would argue that newcomers to a database will launch a high
>>>>>>> traffic,
>>>>>>> multi-gigabyte product with the database from day one. Day one
is
>>>>>>> the
>>>>>>> day where they learn how to query the data and put data into
the
>>>>>>> database. Even for scenarios where people have a running high
>>>>>>> traffic
>>>>>>> system, and have used other databases at a medium to large scale
I
>>>>>>> would expect given they migrate to Couch, that they run both
>>>>>>> systems
>>>>>>> in parallel for the first time in order to fix the issues that
>>>>>>> occur
>>>>>>> during a migration.
>>>>>>> 
>>>>>>> I think we we share the same goal (getting beginners started
>>>>>>> quickly)
>>>>>>> and the cool thing about your suggestion is that everyone gets
the
>>>>>>> required knowledge to run a production system right from the
very
>>>>>>> start. My suggestion leaves some parts out, but reduces the
>>>>>>> cognitive
>>>>>>> load required to get the very first basic results, e.g. in a
>>>>>>> university class setting - or junior developers on their "casual
>>>>>>> friday 20% time". My big hope is, once those folks build high
>>>>>>> traffic
>>>>>>> systems, they remember how easy the usage of CouchDB was and
that
>>>>>>> they
>>>>>>> start to learn more about CouchDB in order to run it in a system
>>>>>>> with
>>>>>>> more than a few thousand documents.
>>>>>>> 
>>>>>>> 
>>>>>>> For us both I think the "what" is clear, but the "how" is a bit
>>>>>>> different. I also think this discussion still makes progress,
but I
>>>>>>> am
>>>>>>> afraid it could stall. I see that we both have very good rudiments
>>>>>>> and
>>>>>>> I would like to invite the other folks from the mailing list
that
>>>>>>> build interfaces for humans (APIs, CLIs or even UIs) to chime
in
>>>>>>> again
>>>>>>> with their opinions - of course I'm also looking forward to your
>>>>>>> answer :)
>>>>>>> 
>>>>>>> Best,
>>>>>>> Robert :)
>>>>>>> 
>>>>>>> On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis
>>>>>>> <paul.joseph.davis@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> - is a timeout solving the root cause or the symptoms?
Could it
>>>>>>>>>> be a
>>>>>>>>>> temporary or additional step as in conjunction with
query
>>>>>>>>>> optimisation
>>>>>>>>>> tooling?
>>>>>>>>> 
>>>>>>>>> It really depends. From my CouchDB admin and user perspective,
>>>>>>>>> this
>>>>>>>>> doesn't seem so important to me right now. However, I
recognize
>>>>>>>>> that
>>>>>>>>> there are different usage scenarios with different requirents
>>>>>>>>> (e.g. the
>>>>>>>>> ones at Cloudant).
>>>>>>>> 
>>>>>>>> I don't think there's anything special about Cloudant in
this
>>>>>>>> discussion. Its just a question of how do we allow new users
the
>>>>>>>> ability to easily test and learn the selector/query API while
>>>>>>>> also
>>>>>>>> preventing them from going too far without creating indexes
for
>>>>>>>> their
>>>>>>>> queries. The slow queries messages are fine, but just as
any
>>>>>>>> other
>>>>>>>> database they don't really prompt the developer to make the
>>>>>>>> correct
>>>>>>>> change. Ie, the developer has to be savvy enough to a) know
that
>>>>>>>> the
>>>>>>>> slow queries logs exist, b) understand that creating an index
>>>>>>>> would
>>>>>>>> speed things up, and then c) know which index to create based
on
>>>>>>>> the
>>>>>>>> logged query.
>>>>>>>> 
>>>>>>>> In my experience, the group of users that we're concerned
about
>>>>>>>> in
>>>>>>>> this discussion most likely don't know about any of those
three
>>>>>>>> things, hence why the current API is designed to force them
to
>>>>>>>> learn
>>>>>>>> about and understand indexes as part of learning the API.
Granted
>>>>>>>> the
>>>>>>>> `_id > null` trick muddies that learning process. I would
think
>>>>>>>> that
>>>>>>>> replacing the _id trick with `"testing": true` or similar
would
>>>>>>>> be an
>>>>>>>> obvious indication to users that this is a dev/debug type
feature
>>>>>>>> and
>>>>>>>> when they went to production they would still be pushed to
using
>>>>>>>> an
>>>>>>>> index. If we add the "create index from selector" API then
I
>>>>>>>> think
>>>>>>>> this would be a relatively straightforward method to on ramping
>>>>>>>> to
>>>>>>>> both the query and index sides of the API. Ie, "You can try
>>>>>>>> queries
>>>>>>>> with testing:true, when you're ready to move to production
you
>>>>>>>> can
>>>>>>>> POST your selector to _index to create the index which allows
you
>>>>>>>> to
>>>>>>>> remove testing:true".
>>>>>>>> 
>>>>>>>> That's also why I don't particularly care for the timeout
>>>>>>>> approach.
>>>>>>>> It's a binary threshold that a user would (maybe) meet after
some
>>>>>>>> unknown amount of time after they falsely believe their app
is
>>>>>>>> working
>>>>>>>> correctly. The feedback is "Everything is fine until it isn't".
>>>>>>>> Consider an app that's been working for a week or a month
or more
>>>>>>>> that
>>>>>>>> suddenly starts throwing timeouts for a query. From the user's
>>>>>>>> perspective the database broke because the query that used
to
>>>>>>>> work
>>>>>>>> fine no longer does. And then there's the follow on question
on
>>>>>>>> how
>>>>>>>> that timeout might instruct the user that they need an index,
and
>>>>>>>> that
>>>>>>>> the fix may be as easy as POSTing their selector to the _index
>>>>>>>> endpoint. Sure Google would most likely have the answer if
our
>>>>>>>> docs
>>>>>>>> are good enough, but by that point the developer is probably
>>>>>>>> already
>>>>>>>> experiencing downtime if their app is live which means they're
>>>>>>>> frantically trying to fix the thing. From my point of view,
a few
>>>>>>>> road
>>>>>>>> blocks that guide developers towards the correct usage early
on
>>>>>>>> would
>>>>>>>> be better than letting them get to the adrenaline fueled
>>>>>>>> expletive
>>>>>>>> fountain of downtime.
>>>>>>> 
>>>>>> 
>>>> 
>> 


Mime
View raw message