Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D332117D5 for ; Wed, 16 Apr 2014 03:59:45 +0000 (UTC) Received: (qmail 80835 invoked by uid 500); 16 Apr 2014 03:59:42 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 80401 invoked by uid 500); 16 Apr 2014 03:59:41 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 80386 invoked by uid 99); 16 Apr 2014 03:59:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 03:59:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of suraj.kumar@inmobi.com designates 209.85.213.170 as permitted sender) Received: from [209.85.213.170] (HELO mail-ig0-f170.google.com) (209.85.213.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 03:59:35 +0000 Received: by mail-ig0-f170.google.com with SMTP id uq10so601140igb.3 for ; Tue, 15 Apr 2014 20:59:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inmobi.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=HMVi8kt0GzYOhb/nvZwJ1nOe3PTJ4vR4TgWvOksfoe8=; b=NYyYbmWwmX8R+pzpg8N1CmMetm7v/EtJEhDY/hXUg90coYK6XTXE2Vd6Zaz5NZBcFe d5BAWjzbCmCWMwTDjt5K0uKN1FC4z6HTwchsI2KbFg4Eie0E87gZgWva/jLPJwjB75ZT Jpx1KY0xP9zcCMX/qGjDqftVUbln8ISWScAK0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=HMVi8kt0GzYOhb/nvZwJ1nOe3PTJ4vR4TgWvOksfoe8=; b=casicxCmrFdLQSRW2x7C92T6+8HmFrxWV8gDeFwH01eZKozrJWX2EqphjbUq+GL78G g2a7uOQAgvvfCPgA+w/1EE4y1Xx2NgqESKVfQg/GCXxUaff5qAq6EZsjlpCCabafhUCA Zg/rcxQbQNfYV3SW22I5MQqFamI4jatQTDKRSL/aZ2aUnU42ve6D3QStSoy31Eyvzc8N LmgoXt3eorK64eNUFeGPbFHzlafQ6TyT0X4utNC627IFeyxLy+zbic0Zmwm0gOiBhw1K jH4+PWAS1scFlqO3Oa20r2W9QJ1fivJsOlBQ0VCQLU3l2LHvcFOoKyV6RLquzydAAmob +noA== X-Gm-Message-State: ALoCoQlGUesToUe7i63xAZoBif5Wd3kjWsbUPl6SI1OZmAxsYk/VijqJ37W808o6IjS4nL+KJnaAVDN2KXTZkjlbGh0Xy5S4c5jcplCSFfppo+CXZn8+kUE= MIME-Version: 1.0 X-Received: by 10.50.82.73 with SMTP id g9mr27436466igy.0.1397620752843; Tue, 15 Apr 2014 20:59:12 -0700 (PDT) Received: by 10.64.76.165 with HTTP; Tue, 15 Apr 2014 20:59:12 -0700 (PDT) In-Reply-To: References: Date: Wed, 16 Apr 2014 09:29:12 +0530 Message-ID: Subject: Re: Modeling Relationships and providing Transactional Integrity From: Suraj Kumar To: dev@couchdb.apache.org, user@couchdb.apache.org Content-Type: multipart/alternative; boundary=047d7bd911d48195ff04f720f056 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd911d48195ff04f720f056 Content-Type: text/plain; charset=UTF-8 Hello, On second reading, it appears I've made a ton of typos and half-done sentences in my original post. But leaving those aside, has anybody managed to read this through and give it a thought? Any questions / clarifications? We'd really like to get started on the right way that will be useful. So your advise will be highly useful. Thanks, -Suraj On Thu, Apr 10, 2014 at 6:24 PM, Suraj Kumar wrote: > [warning: cross-posted] > > Hi, > > We're attempting to build a model of a large scale, complex > Infrastructure. That means, every machine their supporting machines report > to mothership. Since our problem is truly that of high concurrency, > choosing a solid data base to keep state of this model became the focus in > our erstwhile days. We zero'ed in on CouchDB: actually, due to the fact > that there is Erlang powering it and that we can pull off other things (not > met by CouchDB) which Couch doesn't provide. One of those things was the > notion of Relationships. > > What do I mean by "Relationships" really? Some "types" of Entities have > attributes which may potentially be related some other "types" of Entities > in specific known ways (1:1, 1:*, *:1). > > The "Type" becomes the hazy part for schemaless systems like CouchDB. > However, let us now talk in Couch primitives. > > Let us set aside the question of how this could potentially still result > in inconsistency in a live distributed database... and imagine if there > could be 'design' documents that describe how some attributes of some > "types" of documents are related to some other attributes of some other > "types" of documents. Imagine, if this could be used by this new > 'Relationships' engine to automatically validate and keep relational > integrity of the database. To describe in couch-terminology, it is a way to > automatically modify certain keys of related document whenever certain keys > of a given 'type' of document changes. > > I'm now attempting to formally describe two of the basic primitive > elements of every practical schemaless database system, specifically > CouchDB: > > 1. Documents of classifiable 'types' or 'sets'. > 2. Attributes (*keys of the JSON hash*) (and a way to address attributes > using a generic, intuitive and a standard "*convention*") > > I am of the belief that defining these two formally is the first step to > approach implementing Relationships in CouchDB as a usable general purpose > optional feature (for those who are willing to compromise some things in > return :) ). > > Some more thoughts: > > > - "types" in a schemaless JSON data structure can be only determined > by a function that determines the type. Hence, there should be 'type' > determining functions, or classifiers. > - Likewise, we have thus far been using a dotted-notation convention > to address specific attributes. This convention or some similar one can be > used by the relationship module (ex: "os.version", " > last_modified.by.user.id"), as long as the 'keys' themselves don't > have a period ;) > - every relationship will be kept 'in memory', in much the same way as > how validate doc update functions are kept 'in memory' and used for every > write. > - regular Doc PUT/POST API will fail when a document's (of > classifiable 'type') attribute which is involved in a relationship is > changed. > - To modify an attribute that is involved in a relationship, a > "transactional update" API must be used. All the related documents for > those change(s), must also be submitted through this API "bulk_doc"-like > API (perhaps bulk_docs itself?). > - The idea is, a client initiating the transaction update will fetch > all related documents, through a helper API which "denormalizes" all > related documents and returns as a larger hash. > - This will also reference the defined relationships and follows a 3PC > protocol (where an extra metadata field in the document will be used to > keep state of the ongoing "transaction") to allow potential failures during > concurrent other transactional updates. > > Thus, a design document that describes a relationship would look something > like: > > { > "ClassifiedTypePerson": { > "classifier": function (doc) { > if (doc.blah && doc.blah2) { > return true; > } > }, > "relationships": [ { "from": "my.attribute.to.reference.daddy", "to": > "ClassifiedTypeDaddy", "type": "1:1" }, > { from": > "my.other.attribute.to.reference.kids", "to": "ClassifiedTypeChildren", > "type": "1:*" } > ] > } > > This is just a sugary way of defining some commonly recurring > auto-validation rules which invariably reference / depend on other > documents and it is not without compromises. > > The compromises are: > - one-shard-forever compromise: since this is about infrastructure, the > size of the data-set will fit under 2-4 GB. So even if the entire DB has to > be read by Couch, we don't care. This way, whatever "related" documents > will all be found on the same disk. Unless, we formalize distributed > - unpredictable write times compromise: Every write will involve > predictable number of reads and predictable failure for those attributes > which are defined under a 'relationship' (attributes with relationships can > be modified only through a separate 'special' API where all the related > documents > > What do you think about this? Would people here find use for this in your > day-to-day needs? Would the couchdb-devs merge this into mainstream couchdb > if such a patch is submitted? > > Regards, > > -Suraj > > -- > An Onion is the Onion skin and the Onion under the skin until the Onion > Skin without any Onion underneath. > > -- An Onion is the Onion skin and the Onion under the skin until the Onion Skin without any Onion underneath. -- _____________________________________________________________ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. --047d7bd911d48195ff04f720f056--