From dev-return-48398-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Thu Feb 28 21:27:52 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 68CD3180657 for ; Thu, 28 Feb 2019 22:27:51 +0100 (CET) Received: (qmail 4996 invoked by uid 500); 28 Feb 2019 21:27:50 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 4971 invoked by uid 99); 28 Feb 2019 21:27:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Feb 2019 21:27:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id CD66FC2AED for ; Thu, 28 Feb 2019 21:27:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.272 X-Spam-Level: X-Spam-Status: No, score=0.272 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=messagingengine.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id VIYz9pfqkdCV for ; Thu, 28 Feb 2019 21:27:47 +0000 (UTC) Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com [64.147.123.25]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 28A2F62459 for ; Thu, 28 Feb 2019 21:27:47 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id AF7FE35C5 for ; Thu, 28 Feb 2019 16:27:45 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 28 Feb 2019 16:27:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=FHY74m/fIMZEahHtKBl1dbsm/TrXiwde81vfdh2wy UQ=; b=XlPG6Xepsm0OL7EBgnkdMxLAFDx4onvlZ6yRn3OH8afwNs5i4SB44wfjY NU43XHEnKde4KVHlucvj8BNHAXtVvIauDzDw99aBJ3H9wbTgbu+5gij0vFYqNIq3 eK6kdgLxFkr3Ofnu1jOfVendyDgmFE7ppy/UIFBplMqVCmFeN0TeIAzTy1D3Ggxc g970HoaiNyH0C8lNGSAPP/U8BDw36tb/jEZHb8edfmZKhgXOyqQNSO8iFX5nXLIL g98L+K0u7UGv73apkTkRE9/rdogXhEzqkADEFaah3TXwxcVXPADG9ibcc/9MvJSb AuHxXIVgAaUz5qqAWNG/9xbUjVUNg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedutddrvdefgdduheefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhtgfgggfuffhfvfgjkffosehtqh hmtdhhtdejnecuhfhrohhmpeetuggrmhcumfhotgholhhoshhkihcuoehkohgtohhlohhs khesrghprggthhgvrdhorhhgqeenucffohhmrghinhepghhithhhuhgsrdgtohhmnecukf hppeduvdelrdegvddrvddtkedrudejvdenucfrrghrrghmpehmrghilhhfrhhomhepkhho tgholhhoshhksegrphgrtghhvgdrohhrghenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from kocolosk.cambridge.ibm.com (bi-03pt1.bluebird.ibm.com [129.42.208.172]) by mail.messagingengine.com (Postfix) with ESMTPA id 5A59E10319 for ; Thu, 28 Feb 2019 16:27:44 -0500 (EST) From: Adam Kocoloski Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts Date: Thu, 28 Feb 2019 16:27:43 -0500 References: <1549309264.3455235.1650702256.0515B088@webmail.messagingengine.com> To: dev@couchdb.apache.org In-Reply-To: Message-Id: <8442EBA8-7184-4CE5-BBBB-E1E5B3917015@apache.org> X-Mailer: Apple Mail (2.3445.102.3) I=E2=80=99ve gone ahead and submitted an RFC for the design discussed = here with a small modification: https://github.com/apache/couchdb/issues/1957 Cheers, Adam > On Feb 11, 2019, at 2:37 PM, Adam Kocoloski = wrote: >=20 > Agreed, I don=E2=80=99t have answer for this. I propose to drop the = optimization for now given the implementation complexity for any = solution that does not cause a performance degradation. >=20 > Adam >=20 >> On Feb 11, 2019, at 2:11 PM, Ilya Khlopotov = wrote: >>=20 >>> We could represent these using the following set of KVs: >>>=20 >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cactive=E2=80=9D) =3D true >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D) =3D kCONFLICT >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D, =E2=80=9C1-abc=E2=80=9D= ) =3D =E2=80=9Calice=E2=80=9D >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D, =E2=80=9C1-def=E2=80=9D= ) =3D =E2=80=9Cbob=E2=80=9D >> I still cannot see how we can figure out if conflict for JSON path is = present without reading previous revisions. The complex way of solving = the issue is to use some sort of succinct atomically updated structure = which we can quickly read. The structure would have to be capable of = answering the following question: >> - what are the hashes of different revisions of a subtree for a given = json path >>=20 >>=20 >>=20 >> On 2019/02/04 23:22:09, Adam Kocoloski wrote:=20= >>> I think it=E2=80=99s fine to start a focused discussion here as it = might help inform some of the broader debate over in that thread. >>>=20 >>> As a reminder, today CouchDB writes the entire body of each document = revision on disk as a separate blob. Edit conflicts that have common = fields between them do not share any storage on disk. The revision tree = is encoded into a compact format and a copy of it is stored directly in = both the by_id tree and the by_seq tree. Each leaf entry in the revision = tree contain a pointer to the position of the associated doc revision on = disk. >>>=20 >>> As a further reminder, CouchDB 2.x clusters can generate edit = conflict revisions just from multiple clients concurrently updating the = same document in a single cluster. This won=E2=80=99t happen when = FoundationDB is running under the hood, but users who deploy multiple = CouchDB or PouchDB servers and replicate between them can of course = still produce conflicts just like they could in CouchDB 1.x, so we need = a solution. >>>=20 >>> Let=E2=80=99s consider the two sub-topics separately: 1) storage of = edit conflict bodies and 2) revision trees >>>=20 >>> ## Edit Conflict Storage >>>=20 >>> The simplest possible solution would be to store each document = revision separately, like we do today. We could store document bodies = with (=E2=80=9Cdocid=E2=80=9D, =E2=80=9Crevid=E2=80=9D) as the key = prefix, and each transaction could clear the key range associated with = the base revision against which the edit is being attempted. This would = work, but I think we can try to be a bit more clever and save on storage = space given that we=E2=80=99re splitting JSON documents into multiple KV = pairs. >>>=20 >>> One thought I=E2=80=99d had is to introduce a special enum Value = which indicates that the subtree =E2=80=9Cbeneath=E2=80=9D the given Key = is in conflict. For example, consider the documents >>>=20 >>> { >>> =E2=80=9C_id=E2=80=9D: =E2=80=9Cfoo=E2=80=9D, >>> =E2=80=9C_rev=E2=80=9D: =E2=80=9C1-abc=E2=80=9D, >>> =E2=80=9Cowner=E2=80=9D: =E2=80=9Calice=E2=80=9D, >>> =E2=80=9Cactive=E2=80=9D: true >>> } >>>=20 >>> and=20 >>>=20 >>> { >>> =E2=80=9C_id=E2=80=9D: =E2=80=9Cfoo=E2=80=9D, >>> =E2=80=9C_rev=E2=80=9D: =E2=80=9C1-def=E2=80=9D, >>> =E2=80=9Cowner=E2=80=9D: =E2=80=9Cbob=E2=80=9D, >>> =E2=80=9Cactive=E2=80=9D: true >>> } >>>=20 >>> We could represent these using the following set of KVs: >>>=20 >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cactive=E2=80=9D) =3D true >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D) =3D kCONFLICT >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D, =E2=80=9C1-abc=E2=80=9D= ) =3D =E2=80=9Calice=E2=80=9D >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D, =E2=80=9C1-def=E2=80=9D= ) =3D =E2=80=9Cbob=E2=80=9D >>>=20 >>> This approach also extends to conflicts where the two versions have = different data types. Consider a more complicated example where bob = dropped the =E2=80=9Cactive=E2=80=9D field and changed the =E2=80=9Cowner=E2= =80=9D field to an object: >>>=20 >>> { >>> =E2=80=9C_id=E2=80=9D: =E2=80=9Cfoo=E2=80=9D, >>> =E2=80=9C_rev=E2=80=9D: =E2=80=9C1-def=E2=80=9D, >>> =E2=80=9Cowner=E2=80=9D: { >>> =E2=80=9Cname=E2=80=9D: =E2=80=9Cbob=E2=80=9D, >>> =E2=80=9Cemail=E2=80=9D: =E2=80=9Cbob@example.com" >>> } >>> } >>>=20 >>> Now the set of KVs for =E2=80=9Cfoo=E2=80=9D looks like this (note = that a missing field needs to be handled explicitly): >>>=20 >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cactive=E2=80=9D) =3D kCONFLICT >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cactive=E2=80=9D, =E2=80=9C1-abc=E2=80= =9D) =3D true >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cactive=E2=80=9D, =E2=80=9C1-def=E2=80= =9D) =3D kMISSING >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D) =3D kCONFLICT >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D, =E2=80=9C1-abc=E2=80=9D= ) =3D =E2=80=9Calice=E2=80=9D >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D, =E2=80=9C1-def=E2=80=9D= , =E2=80=9Cname=E2=80=9D) =3D =E2=80=9Cbob=E2=80=9D >>> (=E2=80=9Cfoo=E2=80=9D, =E2=80=9Cowner=E2=80=9D, =E2=80=9C1-def=E2=80=9D= , =E2=80=9Cemail=E2=80=9D) =3D =E2=80=9Cbob@example.com=E2=80=9D >>>=20 >>> I like this approach for the common case where documents share most = of their data in common but have a conflict in a very specific field or = set of fields.=20 >>>=20 >>> I=E2=80=99ve encountered one important downside, though: an edit = that replicates in and conflicts with the entire document can cause a = bit of a data explosion. Consider a case where I have 10 conflicting = versions of a 100KB document, but the conflicts are all related to a = single scalar value. Now I replicate in an empty document, and suddenly = I have a kCONFLICT at the root. In this model I now need to list out = every path of every one of the 10 existing revisions and I end up with a = 1MB update. Yuck. That=E2=80=99s technically no worse in the end state = than the =E2=80=9Czero sharing=E2=80=9D case above, but one could easily = imagine overrunning the transaction size limit this way. >>>=20 >>> I suspect there=E2=80=99s a smart path out of this. Maybe the system = detects a =E2=80=9Cdefault=E2=80=9D value for each field and uses that = instead of writing out the value for every revision in a conflicted = subtree. Worth some discussion. >>>=20 >>> ## Revision Trees >>>=20 >>> In CouchDB we currently represent revisions as a hash history tree; = each revision identifier is derived from the content of the revision = including the revision identifier of its parent. Individual edit = branches are bounded in *length* (I believe the default is 1000 = entries), but the number of edit branches is technically unbounded. >>>=20 >>> The size limits in FoundationDB preclude us from storing the entire = key tree as a single value; in pathological situations the tree could = exceed 100KB. Rather, I think it would make sense to store each edit = *branch* as a separate KV. We stem the branch long before it hits the = value size limit, and in the happy case of no edit conflicts this means = we store the edit history metadata in a single KV. It also means that we = can apply an interactive edit without retrieving the entire conflicted = revision tree; we need only retrieve and modify the single branch = against which the edit is being applied. The downside is that we = duplicate historical revision identifiers shared by multiple edit = branches, but I think this is a worthwhile tradeoff. >>>=20 >>> I would furthermore try to structure the keys so that it is possible = to retrieve the =E2=80=9Cwinning=E2=80=9D revision in a single limit=3D1 = range query. Ideally I=E2=80=99d like to proide the following = properties: >>>=20 >>> 1) a document read does not need to retrieve the revision tree at = all, just the winning revision identifier (which would be stored with = the rest of the doc) >>> 2) a document update only needs to read the edit branch of the = revision tree against which the update is being applied, and it can read = that branch immediately knowing only the content of the edit that is = being attempted (i.e., it does not need to read the current version of = the document itself). >>>=20 >>> So, I=E2=80=99d propose a separate subspace (maybe =E2=80=9C_meta=E2=80= =9D?) for the revision trees, with keys and values that look like >>>=20 >>> (=E2=80=9C_meta=E2=80=9D, DocID, IsDeleted, RevPosition, RevHash) =3D = [ParentRev, GrandparentRev, =E2=80=A6] >>>=20 >>> The inclusion of IsDeleted, RevPosition and RevHash in the key = should be sufficient (with the right encoding) to create a range query = that automatically selects the =E2=80=9Cwinner=E2=80=9D according to = CouchDB=E2=80=99s arcane rules, which are something like >>>=20 >>> 1) deleted=3Dfalse beats deleted=3Dtrue >>> 2) longer paths (i.e. higher RevPosition) beat shorter ones >>> 3) RevHashes with larger binary values beat ones with smaller values >>>=20 >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>=20 >>> OK, that=E2=80=99s all on this topic from me for now. I think this = is a particularly exciting area where we start to see the dividends of = splitting up data into multiple KV pairs in FoundationDB :) Cheers, >>>=20 >>> Adam >>>=20 >>>=20 >>>> On Feb 4, 2019, at 2:41 PM, Robert Newson = wrote: >>>>=20 >>>> This one is quite tightly coupled to the other thread on data = model, should we start much conversation here before that one gets = closer to a solution? >>>>=20 >>>> --=20 >>>> Robert Samuel Newson >>>> rnewson@apache.org >>>>=20 >>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote: >>>>> This is a beginning of a discussion thread about storage of edit=20= >>>>> conflicts and everything which relates to revisions. >>>>>=20 >>>>>=20 >>>=20 >>>=20 >=20