Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
Subject: Re: [POC] Mango Catch All Selector
From: Jan Lehnardt <jan@apache.org>
In-Reply-To: 
 <CAJ1bcfEWAY30vWVBxRvQr2LJF71y4q3_NR32zxj2_t9XtaPCBQ@mail.gmail.com>
Date: Mon, 18 Jan 2016 12:59:49 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <6575721C-59B4-4E62-8B0A-EAAA8AB40E50@apache.org>
References: <23883201.357.1452724881438.JavaMail.Joan@Brain>
 <038689A5-1BE6-446C-96CB-B29C723D3F9A@apache.org>
 <CAJ1bcfEWAY30vWVBxRvQr2LJF71y4q3_NR32zxj2_t9XtaPCBQ@mail.gmail.com>
To: dev@couchdb.apache.org

This is awesome: +1


> On 18 Jan 2016, at 00:16, Robert Kowalski <rok@kowalski.gd> wrote:
>=20
> Heya,
>=20
> thanks again for all the feedback! I built a prototype and added a =
demo video!
>=20
>> I think the current design constraint around text is a good one, and =
I'm
>> unconvinced including English text is a good direction.
>>=20
>> If you want to take this direction, including a URL to our =
documentation
>> instead (which *is* internationalized) is probably a better way to =
go,
>> something like:
>> .... {"_warning": "http://docs.couchdb.org/en/2.0.0/.....=E2=80=9D}]
>=20
> I really like this idea! I thought long about it and I think it grows
> the scope of the current task. Right now all strings CouchDB returns
> to the user are written in English. The current message that no index
> exists is also in english. Sadly our documentation is not
> internationalised yet - afaik no language has a complete translation
> and the translations are not available as a website or in any other
> public form. I stopped translating to German myself as the promised
> integration into the doc build was never finished in ~1.5 years. For
> the specific task right now I would like to keep the scope as small as
> possible. This does not mean that I would stand in the way if folks
> want to add i18n to the project and its sub-projects and have the
> tooling and time to maintain it.
>=20
>=20
> Because a prototype speaks more than 1000 posts I hacked a prototype
> which includes the warning that was proposed by Garren. You can check
> it out at https://github.com/apache/couchdb-mango/pull/27 - or watch
> the video: https://cloudup.com/cEnbWqbX5Y7
>=20
> What do you think?
>=20
> On Wed, Jan 13, 2016 at 11:58 PM, Jan Lehnardt <jan@apache.org> wrote:
>>=20
>>> On 13 Jan 2016, at 23:41, Joan Touzet <wohali@apache.org> wrote:
>>>=20
>>> Warning: If we start using English text in a response such as this, =
we'll
>>> need to start externalising strings and internationalising them. =
We've never
>>> had to do this before because our API is, in general, terse and =
relies on
>>> HTTP status codes to indicate when something has gone wrong.
>>>=20
>>> I think the current design constraint around text is a good one, and =
I'm
>>> unconvinced including English text is a good direction.
>>>=20
>>> If you want to take this direction, including a URL to our =
documentation
>>> instead (which *is* internationalized) is probably a better way to =
go,
>>> something like:
>>>=20
>>> .... {"_warning": "http://docs.couchdb.org/en/2.0.0/.....=E2=80=9D}]
>>=20
>> bikeshed: maybe slow_warning (like we use not_found on 404s), but =
yeah,
>> something like this!
>>=20
>> Great discussion everyone. I like how we are all making this idea =
better together :)
>>=20
>> Best
>> Jan
>> --
>>=20
>>=20
>>=20
>>>=20
>>>=20
>>>=20
>>> ----- Original Message -----
>>> From: "Robert Kowalski" <rok@kowalski.gd>
>>> To: dev@couchdb.apache.org
>>> Sent: Wednesday, January 13, 2016 2:47:27 PM
>>> Subject: Re: [POC] Mango Catch All Selector
>>>=20
>>> Hi Garren,
>>>=20
>>> what would selector: null do? Return all docs?
>>>=20
>>> Where in the answer from CouchDB would be the warning? Next to the
>>> resultset, like
>>>=20
>>> [{"_id": "foo", "_rev": "535"}, {"_warning": "slow query, use an =
index for
>>> better performance"}] ?
>>>=20
>>> Am Mittwoch, 13. Januar 2016 schrieb Garren Smith :
>>>=20
>>>> Hi Robert,
>>>>=20
>>>> I think you miss understood me, I don=E2=80=99t want it to be a =
different endpoint.
>>>> I just don=E2=80=99t want a user to have to do queries like this =
find({slow:
>>>> true}). I want them to be able to do a query e.g. find({}) or
>>>> find({selector: null}) and then get back the results along with a =
warning
>>>> message telling them that this query would be slow in production.
>>>> The lower the barrier for entry here the better. I know we want to =
protect
>>>> our users for when they go to production, but forcing them to add a =
slow:
>>>> true flag won=E2=80=99t help. It will still require them to read =
the docs a lot
>>>> more than most people are willing to on a first attempt of =
something new.
>>>>=20
>>>> Cheers
>>>> Garren
>>>>> On 12 Jan 2016, at 9:16 PM, Robert Kowalski <rok@kowalski.gd
>>>> <javascript:;>> wrote:
>>>>>=20
>>>>> thank you all for your feedback!
>>>>>=20
>>>>> i like the idea of the error message with a new url.
>>>>>=20
>>>>> i agree with garren that it should be a separate endpoint. it =
takes
>>>>> some complexity off when explaining each endpoint.
>>>>>=20
>>>>> maybe: `/_find_slow`?
>>>>>=20
>>>>> On Tue, Jan 12, 2016 at 10:36 AM, Jan Lehnardt <jan@apache.org
>>>> <javascript:;>> wrote:
>>>>>>=20
>>>>>>> On 11 Jan 2016, at 19:55, Tony Sun <tony.sun427@gmail.com
>>>> <javascript:;>> wrote:
>>>>>>>=20
>>>>>>> Hi Robert,
>>>>>>>=20
>>>>>>> Building upon what others have stated above, what do you think =
about
>>>>>>> the following:
>>>>>>>=20
>>>>>>> 1) Let the user query without creating an index
>>>>>>> 2) Return an error message with a new url that has
>>>>>>> "slow/no_index/developer":true appended at the end. The message =
clearly
>>>>>>> explains that this query will be slow, and that creating an =
index will
>>>> be
>>>>>>> more efficient. However, he or she can continue. The error =
message will
>>>>>>> then have a link to point to our documentation.
>>>>>>> 3) In Fauxton, there is a checkbox or button that also appends =
the
>>>>>>> "slow/no_index/developer":true to the _find url. If the user =
clicks it,
>>>>>>> then the same message pops up to notify the user.
>>>>>>=20
>>>>>>=20
>>>>>> I like this!
>>>>>>=20
>>>>>>=20
>>>>>> Jan
>>>>>> --
>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> Tony
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> On Mon, Jan 11, 2016 at 9:45 AM, Eli Stevens (Gmail) <
>>>> wickedgrey@gmail.com <javascript:;>>
>>>>>>> wrote:
>>>>>>>=20
>>>>>>>> Just wanted to chime in here as a user - I've run into similar
>>>>>>>> behavior from CouchDB with the reduce-not-reducing-enough =
heuristic,
>>>>>>>> where stuff I was working on went smoothly in dev, but stopped =
once
>>>>>>>> real load was pushed through it (thankfully for me, that was in
>>>>>>>> testing, rather than released to customers).
>>>>>>>>=20
>>>>>>>> It's a frustrating experience, and I don't think that a =
reputation for
>>>>>>>> "works until you cross a threshold, and then it doesn't, but =
only in
>>>>>>>> production" is a good thing to move towards.
>>>>>>>>=20
>>>>>>>> Perhaps something like adding a key to the returned data along =
the
>>>>>>>> lines of "_slow_warning": "This query is going to be slow on =
large
>>>>>>>> data sets. See http://..." in addition to the =
?slow_warning=3Dtrue
>>>> query
>>>>>>>> param (note that I'm calling it "slow_warning" in both places =
only to
>>>>>>>> increase discoverability; without the url param, the no-index =
query
>>>>>>>> wouldn't work at all). Bikeshed the name as needed.
>>>>>>>>=20
>>>>>>>> I'd like to see a lot more URLs in CouchDB error messages in =
general,
>>>>>>>> actually - I would find it very useful when trying to determine =
what's
>>>>>>>> going wrong to have a URL right there in the logs that I can =
get more
>>>>>>>> information from.
>>>>>>>>=20
>>>>>>>> On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet =
<wohali@apache.org
>>>> <javascript:;>> wrote:
>>>>>>>>> Hi Robert,
>>>>>>>>>=20
>>>>>>>>> I've been thinking about this one for the week or so, and I =
have a
>>>>>>>>> simple suggestion:
>>>>>>>>>=20
>>>>>>>>> Add the query parameter slow=3Dtrue to enable this behaviour.
>>>>>>>>>=20
>>>>>>>>> This meets all the original requirements:
>>>>>>>>>=20
>>>>>>>>> 1. It is not default behaviour
>>>>>>>>> 2. You can grep the log files for the word 'slow' and find =
evidence
>>>>>>>>> 3. There is a shorthand, simple way to enable the behaviour
>>>>>>>>> 4. Any self-respecting developer will try to remove slow=3Dtrue,=
 find
>>>>>>>>> a break, and be forced to learn about indexes
>>>>>>>>> 5. It's a bit cheeky, which I think is kind of fun :D
>>>>>>>>>=20
>>>>>>>>> All the best,
>>>>>>>>> Joan
>>>>>>>>>=20
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "William Edney" <bedney@technicalpursuit.com =
<javascript:;>>
>>>>>>>>>> To: dev@couchdb.apache.org <javascript:;>
>>>>>>>>>> Sent: Friday, January 8, 2016 10:27:29 AM
>>>>>>>>>> Subject: Re: [POC] Mango Catch All Selector
>>>>>>>>>>=20
>>>>>>>>>> Hi Robert -
>>>>>>>>>>=20
>>>>>>>>>> As a builder of UI, API and library code who has also done =
developer
>>>>>>>>>> training on a variety of technologies, one simple fix might =
be go
>>>>>>>>>> ahead and
>>>>>>>>>> not require indexes to be built, but then to put a big NOTE =
at the
>>>>>>>>>> beginning of the "Mango Getting Started" guide (I would =
assume there
>>>>>>>>>> is
>>>>>>>>>> such a piece of documentation) that states: "Note that the =
examples
>>>>>>>>>> in this
>>>>>>>>>> document do not require you to build an index, but for =
performance
>>>>>>>>>> reasons
>>>>>>>>>> we HIGHLY RECOMMEND that you do so. *Click here* for more
>>>> information
>>>>>>>>>> about
>>>>>>>>>> how to do that" (or some such verbiage).
>>>>>>>>>>=20
>>>>>>>>>> My 2 cents.
>>>>>>>>>>=20
>>>>>>>>>> Cheers,
>>>>>>>>>>=20
>>>>>>>>>> - Bill
>>>>>>>>>>=20
>>>>>>>>>> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski =
<rok@kowalski.gd
>>>> <javascript:;>>
>>>>>>>>>> wrote:
>>>>>>>>>>=20
>>>>>>>>>>> Hi list,
>>>>>>>>>>>=20
>>>>>>>>>>> At the end of the mail I would like to invite the other =
folks from
>>>>>>>>>>> the
>>>>>>>>>>> mailing list that build interfaces for humans (APIs, CLIs or =
even
>>>>>>>>>>> UIs)
>>>>>>>>>>> to chime in again with their opinions. So all people one the =
ML,
>>>>>>>>>>> the
>>>>>>>>>>> mail is not just a response to Paul, feedback is welcome :)
>>>>>>>>>>>=20
>>>>>>>>>>> Hi Paul, I agree with the timeout. It could lead to very =
unpleasant
>>>>>>>>>>> errors which are hard to debug and support.
>>>>>>>>>>>=20
>>>>>>>>>>> I added some thoughts to the other points you made:
>>>>>>>>>>>=20
>>>>>>>>>>>> a) know that the slow queries logs exist,
>>>>>>>>>>>=20
>>>>>>>>>>> Hmm... If I take a look at the 1.x logging it was very
>>>>>>>>>>> straightforward. As a developer you would spin up a CouchDB =
and you
>>>>>>>>>>> get all the log messages into your terminal. It was quite =
handy in
>>>>>>>>>>> general for all kind of debugging. That the logs are not =
displayed
>>>>>>>>>>> directly on stdout/stderr is in my opinion a general 2.x =
problem.
>>>>>>>>>>> The
>>>>>>>>>>> problem does occur with all kinds of log message we produce =
in
>>>>>>>>>>> CouchDB
>>>>>>>>>>> for 2.x and is not specific to the slow-query-logging.
>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>>>> Ie, "You can try queries with testing:true, when you're =
ready to
>>>>>>>>>>>> move to
>>>>>>>>>>> production you can
>>>>>>>>>>>> POST your selector to _index to create the index which =
allows you
>>>>>>>>>>>> to
>>>>>>>>>>>> remove testing:true".
>>>>>>>>>>>=20
>>>>>>>>>>> I really like the migration path you mentioned here with the =
API to
>>>>>>>>>>> create indexes. I am worried to have a too high entry =
barrier for
>>>>>>>>>>> absolute newcomers, people that you want to play around =
before they
>>>>>>>>>>> are ready to think about indexes, e.g. by putting coupling =
the
>>>>>>>>>>> index
>>>>>>>>>>> topic from the beginning to the querying.
>>>>>>>>>>>=20
>>>>>>>>>>> When I throw too much things to learn on people (which  may =
not
>>>>>>>>>>> have
>>>>>>>>>>> used a database before), most people get discouraged and =
does not
>>>>>>>>>>> take
>>>>>>>>>>> a look. The usual things they feel or say are : "too =
complicated",
>>>>>>>>>>> "I
>>>>>>>>>>> have not enough time", "product XY is easier to use".
>>>>>>>>>>>=20
>>>>>>>>>>> I would argue that newcomers to a database will launch a =
high
>>>>>>>>>>> traffic,
>>>>>>>>>>> multi-gigabyte product with the database from day one. Day =
one is
>>>>>>>>>>> the
>>>>>>>>>>> day where they learn how to query the data and put data into =
the
>>>>>>>>>>> database. Even for scenarios where people have a running =
high
>>>>>>>>>>> traffic
>>>>>>>>>>> system, and have used other databases at a medium to large =
scale I
>>>>>>>>>>> would expect given they migrate to Couch, that they run both
>>>>>>>>>>> systems
>>>>>>>>>>> in parallel for the first time in order to fix the issues =
that
>>>>>>>>>>> occur
>>>>>>>>>>> during a migration.
>>>>>>>>>>>=20
>>>>>>>>>>> I think we we share the same goal (getting beginners started
>>>>>>>>>>> quickly)
>>>>>>>>>>> and the cool thing about your suggestion is that everyone =
gets the
>>>>>>>>>>> required knowledge to run a production system right from the =
very
>>>>>>>>>>> start. My suggestion leaves some parts out, but reduces the
>>>>>>>>>>> cognitive
>>>>>>>>>>> load required to get the very first basic results, e.g. in a
>>>>>>>>>>> university class setting - or junior developers on their =
"casual
>>>>>>>>>>> friday 20% time". My big hope is, once those folks build =
high
>>>>>>>>>>> traffic
>>>>>>>>>>> systems, they remember how easy the usage of CouchDB was and =
that
>>>>>>>>>>> they
>>>>>>>>>>> start to learn more about CouchDB in order to run it in a =
system
>>>>>>>>>>> with
>>>>>>>>>>> more than a few thousand documents.
>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>>> For us both I think the "what" is clear, but the "how" is a =
bit
>>>>>>>>>>> different. I also think this discussion still makes =
progress, but I
>>>>>>>>>>> am
>>>>>>>>>>> afraid it could stall. I see that we both have very good =
rudiments
>>>>>>>>>>> and
>>>>>>>>>>> I would like to invite the other folks from the mailing list =
that
>>>>>>>>>>> build interfaces for humans (APIs, CLIs or even UIs) to =
chime in
>>>>>>>>>>> again
>>>>>>>>>>> with their opinions - of course I'm also looking forward to =
your
>>>>>>>>>>> answer :)
>>>>>>>>>>>=20
>>>>>>>>>>> Best,
>>>>>>>>>>> Robert :)
>>>>>>>>>>>=20
>>>>>>>>>>> On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis
>>>>>>>>>>> <paul.joseph.davis@gmail.com <javascript:;>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> - is a timeout solving the root cause or the symptoms? =
Could it
>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>> temporary or additional step as in conjunction with query
>>>>>>>>>>>>>> optimisation
>>>>>>>>>>>>>> tooling?
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> It really depends. =46rom my CouchDB admin and user =
perspective,
>>>>>>>>>>>>> this
>>>>>>>>>>>>> doesn't seem so important to me right now. However, I =
recognize
>>>>>>>>>>>>> that
>>>>>>>>>>>>> there are different usage scenarios with different =
requirents
>>>>>>>>>>>>> (e.g. the
>>>>>>>>>>>>> ones at Cloudant).
>>>>>>>>>>>>=20
>>>>>>>>>>>> I don't think there's anything special about Cloudant in =
this
>>>>>>>>>>>> discussion. Its just a question of how do we allow new =
users the
>>>>>>>>>>>> ability to easily test and learn the selector/query API =
while
>>>>>>>>>>>> also
>>>>>>>>>>>> preventing them from going too far without creating indexes =
for
>>>>>>>>>>>> their
>>>>>>>>>>>> queries. The slow queries messages are fine, but just as =
any
>>>>>>>>>>>> other
>>>>>>>>>>>> database they don't really prompt the developer to make the
>>>>>>>>>>>> correct
>>>>>>>>>>>> change. Ie, the developer has to be savvy enough to a) know =
that
>>>>>>>>>>>> the
>>>>>>>>>>>> slow queries logs exist, b) understand that creating an =
index
>>>>>>>>>>>> would
>>>>>>>>>>>> speed things up, and then c) know which index to create =
based on
>>>>>>>>>>>> the
>>>>>>>>>>>> logged query.
>>>>>>>>>>>>=20
>>>>>>>>>>>> In my experience, the group of users that we're concerned =
about
>>>>>>>>>>>> in
>>>>>>>>>>>> this discussion most likely don't know about any of those =
three
>>>>>>>>>>>> things, hence why the current API is designed to force them =
to
>>>>>>>>>>>> learn
>>>>>>>>>>>> about and understand indexes as part of learning the API. =
Granted
>>>>>>>>>>>> the
>>>>>>>>>>>> `_id > null` trick muddies that learning process. I would =
think
>>>>>>>>>>>> that
>>>>>>>>>>>> replacing the _id trick with `"testing": true` or similar =
would
>>>>>>>>>>>> be an
>>>>>>>>>>>> obvious indication to users that this is a dev/debug type =
feature
>>>>>>>>>>>> and
>>>>>>>>>>>> when they went to production they would still be pushed to =
using
>>>>>>>>>>>> an
>>>>>>>>>>>> index. If we add the "create index from selector" API then =
I
>>>>>>>>>>>> think
>>>>>>>>>>>> this would be a relatively straightforward method to on =
ramping
>>>>>>>>>>>> to
>>>>>>>>>>>> both the query and index sides of the API. Ie, "You can try
>>>>>>>>>>>> queries
>>>>>>>>>>>> with testing:true, when you're ready to move to production =
you
>>>>>>>>>>>> can
>>>>>>>>>>>> POST your selector to _index to create the index which =
allows you
>>>>>>>>>>>> to
>>>>>>>>>>>> remove testing:true".
>>>>>>>>>>>>=20
>>>>>>>>>>>> That's also why I don't particularly care for the timeout
>>>>>>>>>>>> approach.
>>>>>>>>>>>> It's a binary threshold that a user would (maybe) meet =
after some
>>>>>>>>>>>> unknown amount of time after they falsely believe their app =
is
>>>>>>>>>>>> working
>>>>>>>>>>>> correctly. The feedback is "Everything is fine until it =
isn't".
>>>>>>>>>>>> Consider an app that's been working for a week or a month =
or more
>>>>>>>>>>>> that
>>>>>>>>>>>> suddenly starts throwing timeouts for a query. =46rom the =
user's
>>>>>>>>>>>> perspective the database broke because the query that used =
to
>>>>>>>>>>>> work
>>>>>>>>>>>> fine no longer does. And then there's the follow on =
question on
>>>>>>>>>>>> how
>>>>>>>>>>>> that timeout might instruct the user that they need an =
index, and
>>>>>>>>>>>> that
>>>>>>>>>>>> the fix may be as easy as POSTing their selector to the =
_index
>>>>>>>>>>>> endpoint. Sure Google would most likely have the answer if =
our
>>>>>>>>>>>> docs
>>>>>>>>>>>> are good enough, but by that point the developer is =
probably
>>>>>>>>>>>> already
>>>>>>>>>>>> experiencing downtime if their app is live which means =
they're
>>>>>>>>>>>> frantically trying to fix the thing. =46rom my point of =
view, a few
>>>>>>>>>>>> road
>>>>>>>>>>>> blocks that guide developers towards the correct usage =
early on
>>>>>>>>>>>> would
>>>>>>>>>>>> be better than letting them get to the adrenaline fueled
>>>>>>>>>>>> expletive
>>>>>>>>>>>> fountain of downtime.
>>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>=20
>>>>>>=20
>>>>=20
>>>>=20
>>=20