Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3536C17F05 for ; Thu, 2 Oct 2014 13:57:20 +0000 (UTC) Received: (qmail 86195 invoked by uid 500); 2 Oct 2014 13:57:19 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 86135 invoked by uid 500); 2 Oct 2014 13:57:19 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 86124 invoked by uid 99); 2 Oct 2014 13:57:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Oct 2014 13:57:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hector.sanjuan@here.com designates 157.56.112.128 as permitted sender) Received: from [157.56.112.128] (HELO emea01-am1-obe.outbound.protection.outlook.com) (157.56.112.128) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Oct 2014 13:57:14 +0000 Received: from AM3PR04MB354.eurprd04.prod.outlook.com (10.242.111.143) by AM3PR04MB354.eurprd04.prod.outlook.com (10.242.111.143) with Microsoft SMTP Server (TLS) id 15.0.1044.10; Thu, 2 Oct 2014 13:56:50 +0000 Received: from AM3PR04MB354.eurprd04.prod.outlook.com ([10.242.111.143]) by AM3PR04MB354.eurprd04.prod.outlook.com ([10.242.111.143]) with mapi id 15.00.1044.008; Thu, 2 Oct 2014 13:56:50 +0000 From: "Sanjuan, Hector" To: "user@couchdb.apache.org" Subject: Re: How to store the delta between doc revisions? Thread-Topic: How to store the delta between doc revisions? Thread-Index: AQHP3aHweDkc/t6ouEKVwPr4EdxKlpwbmaWAgAAD9YCAAAOKgIAABAaAgAAKx4CAALnw5oAAYVCAgAACVuU= Date: Thu, 2 Oct 2014 13:56:50 +0000 Message-ID: <1412258210145.35590@here.com> References: <1412237798339.39335@here.com>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [193.67.103.254] x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:AM3PR04MB354; x-forefront-prvs: 03524FBD26 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(199003)(24454002)(51914003)(189002)(377454003)(164054003)(51704005)(36756003)(120916001)(107046002)(93886004)(76482002)(106356001)(106116001)(99396003)(10300001)(107886001)(85852003)(2351001)(2656002)(87936001)(31966008)(105586002)(50986999)(101416001)(66066001)(95666004)(54356999)(117636001)(110136001)(86362001)(92726001)(20776003)(551934003)(19580395003)(97736003)(85306004)(64706001)(92566001)(46102003)(80022003)(76176999)(2501002)(4396001)(21056001)(19580405001);DIR:OUT;SFP:1102;SCL:1;SRVR:AM3PR04MB354;H:AM3PR04MB354.eurprd04.prod.outlook.com;FPR:;MLV:sfv;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: here.com X-Virus-Checked: Checked by ClamAV on apache.org All is taken care of in the client side.=0A= =0A= I don't store deltas/patch files per se, actually I store full "previous" a= nd "current" versions of the doc(s). Client should be able to produce a dif= fs when needed it whatever format required.=0A= =0A= You could implement cache mechanisms if in need (memcache-like if you want)= . I'm my case documents are fairly small and I am not particularly worried = about the delay introduced by an extra GET.=0A= =0A= As you see there is nothing specially clever in my approach and its quite l= ousy on many aspects. It does not care much about consistency. i.e. if a PU= T succeeds, the subsequent transaction POST might fail. And with replicatio= n enabled two people could edit a document in conflicting ways and each of = them would get a transaction record, even though one of their changes will = get discarded in the conflict resolution. And well, a whole universe of fai= lures can happen when editing multiple docs for 1 transaction. So this is m= ore of a simple paper trail which I keep for some months and then delete an= yway. If you aim for something fully consistent, race-condition proof solut= ion, it is going to be difficult (possibly impossible in a multi-master sce= nario). Perhaps with update handlers you can reach a compromise solution bu= t Im not sure how that is going to work on multi-node setups either.=0A= =0A= H=0A= ________________________________________=0A= From: Eric B =0A= Sent: Thursday, October 2, 2014 15:17=0A= To: user@couchdb.apache.org=0A= Subject: Re: How to store the delta between doc revisions?=0A= =0A= On Thu, Oct 2, 2014 at 4:16 AM, Sanjuan, Hector = =0A= wrote:=0A= =0A= > I manage this outside Couchdb. I have a separate database for=0A= > "transaction" docs which store things like the date a modification occurr= ed=0A= > and the resources that changed and how (one transaction can account for= =0A= > changes for several docs if it happened to be triggered by the same=0A= > operation).=0A= >=0A= =0A= Can you elaborate how you do this? I presume it must all be taken care of= =0A= on the client side? I haven't found anyway to accomplish something like=0A= this via update handlers.=0A= =0A= The main objective is to be able to figure out who touched a doc, when, and= =0A= > what change was likely introduced (we don't expect to revert/restore old= =0A= > revisions too often, although we could).=0A= >=0A= =0A= So you only store patches between revs then I presume? Do you actually use= =0A= something to do a true patch file, or just in a key/value pair? ie:=0A= field1=3Dnew value, field2=3Dnew value, etc.=0A= =0A= =0A= > It has an overhead (every write triggers a GET to fetch the last revision= )=0A= > and doesn't bother much about race conditions or strict history consisten= cy=0A= > (if you do bother too much about these you lose many advantages of the=0A= > noSQL model), but it is really simple to implement (and there is no need = to=0A= > debug code that runs inside couchdb).=0A= >=0A= =0A= Have you considered maintaining a local cache to avoid additional gets=0A= everytime? ie: upon the original get, cache the data and then check the=0A= cache whenever a write is executed.=0A= =0A= I have considered this system, but without multi-document transactions,=0A= there is no way to ensure consistency. (ie: if the document update=0A= succeeds and the history log fails, it is too difficult to roll back the=0A= doc update). And if only storing deltas, missing a rev would make it=0A= impossible to rebuild the history of any document. Additionally, there is= =0A= no way to effectively use update handlers, for the same reason as above.=0A= The history log would have to be written only upon success of the update=0A= handler, at which time it may or not be a successful write. Plus, it is=0A= more difficult retrieving the older rev of the doc that was just updated.= =0A= =0A= Unless I am making things too complicated?=0A= =0A= Thanks,=0A= =0A= Eric=0A= =0A= =0A= =0A= =0A= >=0A= > ________________________________________=0A= > From: Alexander Shorin =0A= > Sent: Wednesday, October 1, 2014 22:23=0A= > To: user@couchdb.apache.org=0A= > Subject: Re: How to store the delta between doc revisions?=0A= >=0A= > That's right: validate_doc_update cannot modify a document to store.=0A= > But it could check if previous version is included into history log=0A= > stored within update document - what is actually your update handled=0A= > doing. So clients have to use your update handlers or implement the=0A= > same logic on their side to by pass validation.=0A= > --=0A= > ,,,^..^,,,=0A= >=0A= >=0A= > On Wed, Oct 1, 2014 at 11:45 PM, Eric Benzacar wrote:= =0A= > > As you mentioned, the update_notif_handler and changes feed are things= =0A= > that=0A= > > are triggered after a document is persisted, so it can cause race=0A= > > conditions. Ideally, I'm looking to trigger a handler just before it i= s=0A= > > persisted.=0A= > >=0A= > > I looked into the validate_doc_update function, but even if I want to= =0A= > store=0A= > > the history log within the document (not opposed to it), I can't seem t= o=0A= > > modify the contents in the validate_doc_update function (which is=0A= > > appropriate). So I'm still no further ahead in figuring out a central= =0A= > > place to do this.=0A= > >=0A= > > So then I am reduced to ensure that every updateHandler I call creates = a=0A= > > history log, and posts/put of the document do it as well. Which means= =0A= > that=0A= > > I am putting code in several different places to perform the same task,= =0A= > > which is error prone and leads to fragmentation.=0A= > >=0A= > > Unless I am missing something?=0A= > >=0A= > > Thanks,=0A= > >=0A= > > Eric=0A= > >=0A= > > On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin =0A= > wrote:=0A= > >=0A= > >> Suddenly no. At least completely. You can create your=0A= > >> validate_doc_update function which will verify that every new stored= =0A= > >> contains some specific data (like previous document version to which= =0A= > >> validate_doc_update also has access), but all this leads to storing=0A= > >> history log inside single document. If you want to track it=0A= > >> separately: changes feed and update_notification_handler are your=0A= > >> friends, but there could be happened race conditions (especially if=0A= > >> compaction get triggered) so there will be always a chance for you to= =0A= > >> miss some revision.=0A= > >> --=0A= > >> ,,,^..^,,,=0A= > >>=0A= > >>=0A= > >> On Wed, Oct 1, 2014 at 11:18 PM, Eric B wrote:= =0A= > >> > Thanks for the valid points. But either way (whether through patche= s=0A= > or=0A= > >> > storing the full previous revision), is there a mechanism in CouchDB= =0A= > in=0A= > >> > which I can require all calls to trigger an updateHandler? In a way= ,=0A= > I'm=0A= > >> > looking more for an update interceptor; something to be run just=0A= > before a=0A= > >> > document is actually persisted to the DB, but that is always execute= d.=0A= > >> >=0A= > >> > Thanks,=0A= > >> >=0A= > >> > Eric=0A= > >> >=0A= > >> >=0A= > >> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin = =0A= > >> wrote:=0A= > >> >=0A= > >> >> Storing patches is good until you're in sure that no single patch= =0A= > will=0A= > >> >> get suddenly deleted. Otherwise you could easily find all your=0A= > history=0A= > >> >> broken. Oblivious, but it is the thing to remember when picking thi= s=0A= > >> >> way of history management. Storing full document copies per revisio= n=0A= > >> >> is more solid solution for such case: you can easily skip or lose o= ne=0A= > >> >> or several revisions and be fine, but it also consumes much more di= sk=0A= > >> >> space. Trade offs are everywhere, pick up the one that suites you.= =0A= > >> >> --=0A= > >> >> ,,,^..^,,,=0A= > >> >>=0A= > >> >>=0A= > >> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B wrote= :=0A= > >> >> > I'm new to CouchDB and trying to figure out the best way to store= a=0A= > >> >> history=0A= > >> >> > of changes for a document.=0A= > >> >> >=0A= > >> >> > Originally, I was thinking the thing that makes the most sense is= =0A= > to=0A= > >> use=0A= > >> >> > the update function of CouchDB but not entirely sure if I can. I= s=0A= > >> there=0A= > >> >> > someway to use the update function and modify/create a second=0A= > >> document in=0A= > >> >> > the process?=0A= > >> >> >=0A= > >> >> > For example, if I have a document which contains notes for a=0A= > client.=0A= > >> >> > Everytime I modify the notes document (ie: add new lines or delet= e=0A= > >> >> lines),=0A= > >> >> > I want to maintain the changes made to it. If there was a way to= =0A= > use=0A= > >> >> > CouchDB's rev fields for this, my problem would be solved, but=0A= > since=0A= > >> >> > CouchDB deletes non-current revs upon compaction, that is not an= =0A= > >> option.=0A= > >> >> >=0A= > >> >> > So instead, I want to create a "history_log" document, where I ca= n=0A= > >> just=0A= > >> >> > store the delta between documents (as a patch, for example).=0A= > >> >> >=0A= > >> >> > In order to do this, I need to have my existing document, my new= =0A= > >> >> document,=0A= > >> >> > compare the changes and write them to a history_log document. Bu= t=0A= > I=0A= > >> >> don't=0A= > >> >> > see if/where I can do that within and update handler.=0A= > >> >> >=0A= > >> >> > Is there something that can help me do this easily within CouchDB= ?=0A= > >> Are=0A= > >> >> > there patch or json compare functions I can have access to from= =0A= > >> within a=0A= > >> >> > CouchDB handler?=0A= > >> >> >=0A= > >> >> > Thanks,=0A= > >> >> >=0A= > >> >> > Eric=0A= > >> >>=0A= > >>=0A= >=0A=