Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62A0C19024 for ; Wed, 30 Mar 2016 13:26:50 +0000 (UTC) Received: (qmail 76077 invoked by uid 500); 30 Mar 2016 13:26:50 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 76006 invoked by uid 500); 30 Mar 2016 13:26:49 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 75995 invoked by uid 99); 30 Mar 2016 13:26:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Mar 2016 13:26:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7A986C0741 for ; Wed, 30 Mar 2016 13:26:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1 X-Spam-Level: * X-Spam-Status: No, score=1 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 25JloOZvxNAR for ; Wed, 30 Mar 2016 13:26:46 +0000 (UTC) Received: from monoceres.uberspace.de (monoceres.uberspace.de [95.143.172.184]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 2C55A5FB05 for ; Wed, 30 Mar 2016 13:26:46 +0000 (UTC) Received: (qmail 3315 invoked from network); 30 Mar 2016 13:26:44 -0000 Received: from localhost (HELO ?10.0.0.10?) (127.0.0.1) by monoceres.uberspace.de with SMTP; 30 Mar 2016 13:26:44 -0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: db/_all_conflicts From: Jan Lehnardt In-Reply-To: Date: Wed, 30 Mar 2016 15:26:41 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <36CF8659-3964-46A0-86F3-7DECC8D025C4@apache.org> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.3112) > On 29 Mar 2016, at 20:14, Adam Kocoloski wrote: >=20 > Neat stuff. Years ago I actually committed this feature to the = codebase using a table scan and then Damien backed it out because of the = scalability concern. Glad to see we=E2=80=99re approaching it in a more = considered fashion this time around :) >=20 > One thing we might consider is to maintain a *count* of the number of = conflicted documents in the database automatically. If the count is = nonzero when you expected it to be zero, build the conflicted documents = index and do your inspection. In the happy case where there are no = conflicts we just saved you a bunch of effort. >=20 > We don=E2=80=99t really need a separate index to accomplish this; we = just need to modify the reducer function supplied to the by_id btree. = We=E2=80=99ve played that game before to add things like data size = accumulators to the DB info object. There may be a modest hit to the = write performance to count the number of non-deleted leafs in the rev = tree on document update, but honestly that says as much about the = inefficiencies in couch_key_tree as anything else - that quantity ought = to be very cheap to uncover. Bob Newson and I talked about this on IRC some more and I think this is = all similar if not the same thinking: remember how we optimised `skip` = in view results? We could keep track of the number of conflicts per = b-tree node and then easily skip over the subtrees that don=E2=80=99t = have any conflicts, so a table-scan would be relatively cheap. Best Jan -- >=20 > Adam >=20 >> On Mar 29, 2016, at 1:26 PM, Robert Kowalski wrote: >>=20 >> Hi, >>=20 >> good points! >>=20 >>> 3.1. An optimisation of 3. would be making this an Erlang view, but = that would come with >>> the additional security concern of opening up Erlang views. >>=20 >> The great thing about Mango is, with an index Mango is faster than JS >> views as it is Erlang based. >>=20 >>=20 >> And Dale is making a good suggestion. >>=20 >> ``` >> { >> selector: { >> _conflicts: {'$exists`: true} >> } >> ``` >>=20 >> The selector already works without an index with the latest change in >> Mango, it doesn't strictly require an index for ad-hoc queries any >> more: = https://github.com/apache/couchdb-mango/commit/01252f971bef0c8da1d78bf5a7b= 506b71926ce1b >>=20 >> Cool so we are already almost done! :) >>=20 >> This is great for development and I wonder if we could reduce the >> friction for people that would like to use an index for conflicts, >> e.g. in their production systems. Remember, the mission is to make >> conflict handling a first class citizen in CouchDB and make it as = easy >> as possible for our users. >>=20 >> Current state: >>=20 >> POST to `$DB/_index`: >>=20 >> ``` >>=20 >> { >> "index": { >> "fields": ["_conflicts"] >> }, >> "name" : "conflict-index", >> "type" : "json" >> } >>=20 >> ``` >>=20 >> I feel it is hard to type on the terminal, e.g. when I use curl. With >> a JS HTTP client it is also a lot to type. >>=20 >>=20 >> I thought about API sugar. I feel unsure about API-sugar which could >> abstract this somehow, as I don't want to pollute the API. At the = same >> time I would also like to make it as easy as possible for users to >> handle their conflicts. >>=20 >> Rough idea: >>=20 >> POST to `$DB/_index`: >>=20 >> ``` >> { "type" : "conflicts" } >> ``` >>=20 >> Hmmm.... >>=20 >> What do you think? >>=20 >> On Mon, Mar 14, 2016 at 4:54 PM, Jan Lehnardt wrote: >>>=20 >>>> On 14 Mar 2016, at 16:22, Dale Harvey wrote: >>>>=20 >>>> I would really like to give users better abilities to handle = conflict >>>> resolution, I am however extremely worried about considering to = introduce >>>> another API endpoint. We have like 6/7 read API's each of them = having their >>>> own idiosyncrasies and its extremely confusing for users to know = which to >>>> use when. >>>>=20 >>>> If we could extend our existing APIs to cater for this use case it = seems >>>> hugely preferable, ie something like mango / pouchdb find >>>>=20 >>>> db.find({ >>>> selector: { >>>> _conflicts: {'$exists`: true} >>>> } >>>> }).then(function (result) { >>>> ... >>>> }); >>>=20 >>> Great input Dale! >>>=20 >>> Let=E2=80=99s split this into two issues then: >>>=20 >>> A. how do we get the information. >>> B. how do we present it to users. >>>=20 >>>=20 >>> As for B., the thought process went like this: >>>=20 >>> 1. _all_docs + Erlang filter. >>>=20 >>> As Robert pointed out, that=E2=80=99s a no-go for large databases. >>>=20 >>>=20 >>> 2. Add another index to the main database file like by-seq/by-id = (_changes/_all_docs) >>>=20 >>> I pointed out that this will make all write operations slower, for = everyone, not just for the people who want this. (A scenario where I = wouldn=E2=80=99t want this is where CouchDB is the cloud-counterpart for = one or more PouchDB instances, and conflict resolution only ever happens = in PouchDB). >>>=20 >>> So I=E2=80=99d say this is a soft-no on adding this to the main = database file, also given that we had similar discussions about adding = another index to view files before. >>>=20 >>>=20 >>> 3. A view: Fauxton could hide creating a ddoc behind a button, and = users could opt into this easily, while understanding the trade-offs. >>>=20 >>> Robert feels like tying this to Fauxton as opposed to CouchDB makes = this approach useful for fewer people than it could (props for not being = focussed on your own project there ;) >>>=20 >>>=20 >>> 3.1. An optimisation of 3. would be making this an Erlang view, but = that would come with the additional security concern of opening up = Erlang views. >>>=20 >>>=20 >>> 4. Given all of the above, how about this: a new CouchDB module = (couch_conflicts) that is essentially an Erlang view for conflicts that = is disabled by default, but when enabled uses the native query server to = build an index that can give the list of conflicting documents (and the = conflicting revisions?) *without* having to enable the native query = server for everyone. The module can be enabled in the config (or admin = PUT to the endpoint as other things in 2.0). We=E2=80=99d also build a = basic keep-view-indexes-up-to-date that would trigger an update after, = say, 1000 doc updates (we=E2=80=99d make that configurable of course), = something which we=E2=80=99d want for other views as well anyway. >>>=20 >>> * * * >>>=20 >>> As for A., how we present this to the user I have no strong feelings = about. We could make this part of Mango, like Dale suggested, or a new = /db/_all_conflicts with its own idiosyncrasies or something else ;) >>>=20 >>>=20 >>> I just want to make sure make the right trade-offs on the = storage/indexing level, and, while not making everyone pay for the = overhead, make it really easy to opt into this feature. (Unless we all = agree that the performance hit for 2. is worth it :) >>>=20 >>>=20 >>> Best >>> Jan >>> -- >>>=20 >>>=20 >>>=20 >>>=20 >>>>=20 >>>>=20 >>>> On 14 March 2016 at 14:07, Sebastian Rothbucher < >>>> sebastianrothbucher@googlemail.com> wrote: >>>>=20 >>>>> Hi Robert, >>>>>=20 >>>>> this looks awesome already! I don't want to be the spoiler in = this, but >>>>> wouldn't conflicts occur recently, e.g. using _changes = (descending) might >>>>> do the trick of limit-ing? (Still you'd discard docs that simply = don't have >>>>> conflicts, but probably way not that many) >>>>>=20 >>>>> If that doesn't do the trick: just forget what I just said ;-) >>>>>=20 >>>>> Best >>>>> Sebastian >>>>>=20 >>>>> On Mon, Mar 14, 2016 at 2:58 PM, Robert Kowalski = wrote: >>>>>=20 >>>>>> Hi folks, >>>>>>=20 >>>>>> it is hackweek for the Fauxton team and I am lucky enough to be = able >>>>>> to work on whatever I want :) >>>>>>=20 >>>>>> Conflicts are an integral part of CouchDB. Right now I dream of = making >>>>>> conflict-resolution a first class citizen in Couch. Conflict >>>>>> resolution requires a lot of manual steps. The idea is to give = the >>>>>> user all the tools they need to easily solve conflicts, and also = to >>>>>> help them to avoid conflicts in the future. >>>>>>=20 >>>>>> To empower every user to detect and solve conflicts easily on = their >>>>>> own, instead of writing some custom bash/js scripts and custom = view >>>>>> hackery I would like to have a list of conflicts in Fauxton for = every >>>>>> database. >>>>>>=20 >>>>>> The list, provided by Couch, shows which documents have = conflicts. I >>>>>> can then click on the conflicting doc and get a nice diffing = editor >>>>>> which helps me to solve the conflict. Here's an early draft: [1] >>>>>>=20 >>>>>> Discussing the matter in couchdb-dev we thought about serverside >>>>>> filtering of _all_docs - which is a problem for large databases. >>>>>>=20 >>>>>> Another option is a new endpoint, e.g. /db/_all_conflicts. Behind = this >>>>>> endpoint is an index which is listing the conflicting documents. >>>>>>=20 >>>>>> Jan and Alex suggested the index could be opt-in. They suggested = an >>>>>> "auto-warmer" - it would update the index every 1000 doc updates = or >>>>>> so. This way not every doc write would get slower. In later = iteration >>>>>> we could even expose the "auto-warming" feature to other views. >>>>>>=20 >>>>>> Do you want to join me on my quest to provide the best conflict >>>>>> resolution tools and education? >>>>>> What do you think about it? >>>>>>=20 >>>>>> Best, >>>>>> Robert :) >>>>>>=20 >>>>>> [1] >>>>>>=20 >>>>> = https://cloud.githubusercontent.com/assets/298166/13741539/c4ecf6d0-e9ce-1= 1e5-84c5-502b0989c290.png >>>>>>=20 >>>>>=20 >>>=20 >>> -- >>> Professional Support for Apache CouchDB: >>> https://neighbourhood.ie/couchdb-support/ >>>=20 >=20 --=20 Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/