couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: db/_all_conflicts
Date Tue, 29 Mar 2016 18:14:30 GMT
Neat stuff. Years ago I actually committed this feature to the codebase using a table scan
and then Damien backed it out because of the scalability concern. Glad to see we’re approaching
it in a more considered fashion this time around :)

One thing we might consider is to maintain a *count* of the number of conflicted documents
in the database automatically. If the count is nonzero when you expected it to be zero, build
the conflicted documents index and do your inspection. In the happy case where there are no
conflicts we just saved you a bunch of effort.

We don’t really need a separate index to accomplish this; we just need to modify the reducer
function supplied to the by_id btree. We’ve played that game before to add things like data
size accumulators to the DB info object. There may be a modest hit to the write performance
to count the number of non-deleted leafs in the rev tree on document update, but honestly
that says as much about the inefficiencies in couch_key_tree as anything else - that quantity
ought to be very cheap to uncover.

Adam

> On Mar 29, 2016, at 1:26 PM, Robert Kowalski <rok@kowalski.gd> wrote:
> 
> Hi,
> 
> good points!
> 
>> 3.1. An optimisation of 3. would be making this an Erlang view, but that would come
with
>> the additional security concern of opening up Erlang views.
> 
> The great thing about Mango is, with an index Mango is faster than JS
> views as it is Erlang based.
> 
> 
> And Dale is making a good suggestion.
> 
> ```
> {
>  selector: {
>    _conflicts: {'$exists`: true}
>  }
> ```
> 
> The selector already works without an index with the latest change in
> Mango, it doesn't strictly require an index for ad-hoc queries any
> more: https://github.com/apache/couchdb-mango/commit/01252f971bef0c8da1d78bf5a7b506b71926ce1b
> 
> Cool so we are already almost done! :)
> 
> This is great for development and I wonder if we could reduce the
> friction for people that would like to use an index for conflicts,
> e.g. in their production systems. Remember, the mission is to make
> conflict handling a first class citizen in CouchDB and make it as easy
> as possible for our users.
> 
> Current state:
> 
> POST to `$DB/_index`:
> 
> ```
> 
> {
>    "index": {
>        "fields": ["_conflicts"]
>    },
>    "name" : "conflict-index",
>    "type" : "json"
> }
> 
> ```
> 
> I feel it is hard to type on the terminal, e.g. when I use curl. With
> a JS HTTP client it is also a lot to type.
> 
> 
> I thought about API sugar. I feel unsure about API-sugar which could
> abstract this somehow, as I don't want to pollute the API. At the same
> time I would also like to make it as easy as possible for users to
> handle their conflicts.
> 
> Rough idea:
> 
> POST to `$DB/_index`:
> 
> ```
> { "type" : "conflicts" }
> ```
> 
> Hmmm....
> 
> What do you think?
> 
> On Mon, Mar 14, 2016 at 4:54 PM, Jan Lehnardt <jan@apache.org> wrote:
>> 
>>> On 14 Mar 2016, at 16:22, Dale Harvey <dale@arandomurl.com> wrote:
>>> 
>>> I would really like to give users better abilities to handle conflict
>>> resolution, I am however extremely worried about considering to introduce
>>> another API endpoint. We have like 6/7 read API's each of them having their
>>> own idiosyncrasies and its extremely confusing for users to know which to
>>> use when.
>>> 
>>> If we could extend our existing APIs to cater for this use case it seems
>>> hugely preferable, ie something like mango / pouchdb find
>>> 
>>> db.find({
>>> selector: {
>>>   _conflicts: {'$exists`: true}
>>> }
>>> }).then(function (result) {
>>> ...
>>> });
>> 
>> Great input Dale!
>> 
>> Let’s split this into two issues then:
>> 
>> A. how do we get the information.
>> B. how do we present it to users.
>> 
>> 
>> As for B., the thought process went like this:
>> 
>> 1. _all_docs + Erlang filter.
>> 
>> As Robert pointed out, that’s a no-go for large databases.
>> 
>> 
>> 2. Add another index to the main database file like by-seq/by-id (_changes/_all_docs)
>> 
>> I pointed out that this will make all write operations slower, for everyone, not
just for the people who want this. (A scenario where I wouldn’t want this is where CouchDB
is the cloud-counterpart for one or more PouchDB instances, and conflict resolution only ever
happens in PouchDB).
>> 
>> So I’d say this is a soft-no on adding this to the main database file, also given
that we had similar discussions about adding another index to view files before.
>> 
>> 
>> 3. A view: Fauxton could hide creating a ddoc behind a button, and users could opt
into this easily, while understanding the trade-offs.
>> 
>> Robert feels like tying this to Fauxton as opposed to CouchDB makes this approach
useful for fewer people than it could (props for not being focussed on your own project there
;)
>> 
>> 
>> 3.1. An optimisation of 3. would be making this an Erlang view, but that would come
with the additional security concern of opening up Erlang views.
>> 
>> 
>> 4. Given all of the above, how about this: a new CouchDB module (couch_conflicts)
that is essentially an Erlang view for conflicts that is disabled by default, but when enabled
uses the native query server to build an index that can give the list of conflicting documents
(and the conflicting revisions?) *without* having to enable the native query server for everyone.
The module can be enabled in the config (or admin PUT to the endpoint as other things in 2.0).
We’d also build a basic keep-view-indexes-up-to-date that would trigger an update after,
say, 1000 doc updates (we’d make that configurable of course), something which we’d want
for other views as well anyway.
>> 
>> * * *
>> 
>> As for A., how we present this to the user I have no strong feelings about. We could
make this part of Mango, like Dale suggested, or a new /db/_all_conflicts with its own idiosyncrasies
or something else ;)
>> 
>> 
>> I just want to make sure make the right trade-offs on the storage/indexing level,
and, while not making everyone pay for the overhead, make it really easy to opt into this
feature. (Unless we all agree that the performance hit for 2. is worth it :)
>> 
>> 
>> Best
>> Jan
>> --
>> 
>> 
>> 
>> 
>>> 
>>> 
>>> On 14 March 2016 at 14:07, Sebastian Rothbucher <
>>> sebastianrothbucher@googlemail.com> wrote:
>>> 
>>>> Hi Robert,
>>>> 
>>>> this looks awesome already! I don't want to be the spoiler in this, but
>>>> wouldn't conflicts occur recently, e.g. using _changes (descending) might
>>>> do the trick of limit-ing? (Still you'd discard docs that simply don't have
>>>> conflicts, but probably way not that many)
>>>> 
>>>> If that doesn't do the trick: just forget what I just said ;-)
>>>> 
>>>> Best
>>>>   Sebastian
>>>> 
>>>> On Mon, Mar 14, 2016 at 2:58 PM, Robert Kowalski <rok@kowalski.gd>
wrote:
>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> it is hackweek for the Fauxton team and I am lucky enough to be able
>>>>> to work on whatever I want :)
>>>>> 
>>>>> Conflicts are an integral part of CouchDB. Right now I dream of making
>>>>> conflict-resolution a first class citizen in Couch. Conflict
>>>>> resolution requires a lot of manual steps. The idea is to give the
>>>>> user all the tools they need to easily solve conflicts, and also to
>>>>> help them to avoid conflicts in the future.
>>>>> 
>>>>> To empower every user to detect and solve conflicts easily on their
>>>>> own, instead of writing some custom bash/js scripts and custom view
>>>>> hackery I would like to have a list of conflicts in Fauxton for every
>>>>> database.
>>>>> 
>>>>> The list, provided by Couch, shows which documents have conflicts. I
>>>>> can then click on the conflicting doc and get a nice diffing editor
>>>>> which helps me to solve the conflict. Here's an early draft: [1]
>>>>> 
>>>>> Discussing the matter in couchdb-dev we thought about serverside
>>>>> filtering of _all_docs - which is a problem for large databases.
>>>>> 
>>>>> Another option is a new endpoint, e.g. /db/_all_conflicts. Behind this
>>>>> endpoint is an index which is listing the conflicting documents.
>>>>> 
>>>>> Jan and Alex suggested the index could be opt-in. They suggested an
>>>>> "auto-warmer" - it would update the index every 1000 doc updates or
>>>>> so. This way not every doc write would get slower. In later iteration
>>>>> we could even expose the "auto-warming" feature to other views.
>>>>> 
>>>>> Do you want to join me on my quest to provide the best conflict
>>>>> resolution tools and education?
>>>>> What do you think about it?
>>>>> 
>>>>> Best,
>>>>> Robert :)
>>>>> 
>>>>> [1]
>>>>> 
>>>> https://cloud.githubusercontent.com/assets/298166/13741539/c4ecf6d0-e9ce-11e5-84c5-502b0989c290.png
>>>>> 
>>>> 
>> 
>> --
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 


Mime
View raw message