From dev-return-4374-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Tue May 26 07:11:37 2009 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 68895 invoked from network); 26 May 2009 07:11:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 May 2009 07:11:37 -0000 Received: (qmail 24768 invoked by uid 500); 26 May 2009 07:11:49 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 24665 invoked by uid 500); 26 May 2009 07:11:49 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 24655 invoked by uid 99); 26 May 2009 07:11:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2009 07:11:49 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jurgvanvliet@gmail.com designates 209.85.219.168 as permitted sender) Received: from [209.85.219.168] (HELO mail-ew0-f168.google.com) (209.85.219.168) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2009 07:11:37 +0000 Received: by ewy12 with SMTP id 12so4096316ewy.11 for ; Tue, 26 May 2009 00:11:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=TpHpMmvZgPl1Eco/J/HDEOZ2PHtwm9lIgsw5Zg74Dwk=; b=SAkZk5+5xkUM3PEh5DiJX6oTsTluSxBaHJBezeQZxXUM7jMAZfmOFp/CF59pOrWSfv Zwo06usVLOZhaYENJuG1zTE+VehBSVJ9iYQ7BYaRGttJeNaa1nYvSJOBr6zlsT5asZee lqNuS/bFeBgxv7kadxNF2xc6aG5r8siaNB0/8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=nMm67XTw5rClqLA8dX9wL/Dm7fl77J4hCe//E+mV0QFX+O6BDATJEuRui389tQd8Ml VFz6vN6mCZOyXAvZLpQhT82a7qeM4YYY5reEX1nKVkm5gVk5LuEf8OjSmXNkLwVRPYEs Itl0ZWIBU7A/4uY7I5zT3FwLazJCsVcJ5qo4A= Received: by 10.216.0.79 with SMTP id 57mr2865729wea.48.1243321877173; Tue, 26 May 2009 00:11:17 -0700 (PDT) Received: from Truthtrap.lan (a82-95-195-249.adsl.xs4all.nl [82.95.195.249]) by mx.google.com with ESMTPS id 28sm1014027eyg.44.2009.05.26.00.11.16 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 26 May 2009 00:11:16 -0700 (PDT) Message-Id: <91495719-84AC-4635-A69A-FE3FFD091E5E@gmail.com> From: Jurg van Vliet To: dev@couchdb.apache.org In-Reply-To: <4A1B1580.1060609@borwankar.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: struggling with couchdb in production Date: Tue, 26 May 2009 09:11:15 +0200 References: <1DEB5002-D380-48D1-8C6A-3237420323B2@gmail.com> <4A1B1580.1060609@borwankar.com> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org On May 26, 2009, at 12:02 AM, Nitin Borwankar wrote: > Hi guys, > > Coming from a long bout of "relational database illness" (18+ > years) from which I rapidly recovered after the doctor ordered > CouchDB, here's how I think about it. Just some very loose informal > rules of thumb. > > A couch db data model is a denormalized data model - so don't start > with an ER diagram and map to tables, add indexes, pr.key->f.key etc. > Normalization is an unnatural act in couchdb and documents. > > > It may be better to start with an object diagram and UML if you want > to go that route. > The big question is how far to go with the denormalization. > > > If your model is an acyclic graph you can theoretically have just > one large document that is deeply nested. > But you probably will go a two or three levels deep max. i agree with this wholeheartedly. but i would like to have some other thechnique that helps in modeling. from what you suggest, my experience and chris' remarks it appears as if we are all looking at some form of maximum normal form, instead of minimum normal form. or not? what is the the maximum you can get away with? but what is the cost of maximization? @chris, i don't think 'update congestion' is the MAIN problem, it certainly is one of the problems that may arise. but in the case of users involved i see conflicts as something they should handle, because it has meaning, and should be reacted to as such. i understand that in a livechat a user is not interested so much in being 'interrupted' all the time as she wants to say something :P > > > But if your model is a meshed network then you probably want to go > two levels - e.g. take a look at the Twitter JSON reponse format and > how it embeds user info inside a status message, and in contrast how > it embeds status message (last status) inside user object - in each > case the embedded object has just a few of the attributes of the > original object - just enough to provide meaningful info in context > of the containing object. > Instead of foreign keys use URI's - you could use namespaced URI's > sometablename.id in relational model becomes namespace:localid > Of course you can just use couchdb GUIDs if you want. yes, i also agree with this. but i don't have a clear and clean solution of dealing with the data replication at this level. i don't expect couchdb replication to give a hand, it would mean sort of per- document dynamic replication strategy. (it would be nice though, only then i would like to have it hidden deep away in something like activerecord in the case of rails :)) in one solution we have implemented a two-way relationship a little bit like this. we use couchdb keys as a reference, and as long as we know which database the document is in they are unique. and we accept the cost of reading the database some more times, to get the necessary information. (i am not so afraid of reading, writing is different, though.) > > > And finally in typical Rails-like webapps you have result sets for > navigation and browsing - > > here > * "select col1, col2 where ..." corresponds to a map() function with > some logic and then emit(doc.attr1, doc.attr2) - very loosely > speaking. > * "select count(col3)" and similar aggregates are achieved by having > a reduce() in addition to the map() yes, but these 2 patterns are too limited. you still want to combine different sorts of information in your database. the biggest problem in using reduce is that it can't 'undo' an emit, it can't disregard or disqualify previously emitted rows. i have no idea if this is something that would be helpful in couchdb, but i have found myself wishing for something like this. > > > Hope this helps, yes, nitin, this certainly helps me. it helps knowing my thinking is at least in the same direction as others. and, thank you for sharing :) > > > Nitin Borwankar. > > (Perhaps this should be a blog post ?) > > > > > Chris Anderson wrote: >> On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet > > wrote: >> >>> guys and girls, >>> >>> i am a 'real' user of couchdb, and i am having a lot of fun with >>> it in >>> addition to creating real value! but it is far from easy, >>> especially in >>> combination with a framework that is built around relational >>> databases like >>> rails. and still, after 4 months of intensively working with >>> couchdb i am >>> still a big fan. >>> >>> but couchdb is not finished yet. and i don't mean not finished in >>> the sense >>> of the software program that you can run, or the community that is >>> building >>> this. what i mean is that there is no documented approach to model >>> real >>> world problems in a couchdb way. you can search but the most >>> interesting >>> examples are to clarify the idea, or to show that it is possible. >>> but >>> nothing that helps me think about when to use a document, when a >>> database, >>> when a view, etc. etc. >>> >>> we have taken a couple of wrong design decisions the last couple >>> of months. >>> you can call it ignorance, or hindsight, or something else. i >>> think it is >>> just the lack of a good framework for thinking couchdb. >>> >>> when you make your relational database model, your tables, your >>> rows, your >>> indexes, etc. there is a large body of documentation that helps >>> you approach >>> the problem. and even with years of practice, and people having >>> the word >>> database and administrator in their jobtitle, designing your >>> database models >>> is just difficult. (there are really not many people i want to >>> have thinking >>> about tables and rows and indexes.) >>> >>> so now we have to make this paradigm shift. how are WE managing to >>> struggle >>> through this? >>> >>> one of my personal insights is that couchdb is so different from a >>> relational database that it is best approached as if it is the >>> opposite. in >>> a rdb you 'minimize' the entity of information, you normalize >>> until it is >>> small enough to still have meaning. once everything is >>> deconstructed you add >>> rules (validations) your data must adhere to. having done that you >>> start to >>> put it back together using joins. >>> >> >> yes, there's a lot of "unlearning" that needs to be done, and that >> takes time. >> >> >>> in couchdb this pattern doesn't work very well, at least not for >>> us. we >>> learned it is easier to put as much data together in one document as >>> possible. my rule of thumb of when to stop is in distribution. i >>> often ask >>> myself 'do i want to keep this together when i move it to another >>> database?' >>> once you have your documents views are very convenient to take your >>> documents apart. >>> >> >> My rule of thumb is that you want documents to contain their own >> context. An individual document should make sense even if you don't >> have any others that it may refer to. >> >> The main pressure getting you to split data into multiple documents >> is >> update contention. If a lot of people are editing a list >> simultaneously, then you need to make each list item it's own >> document. If only one person ever edits the list, and the list is >> relatively short, than putting it in one document may be easier. >> >> >>> a database in couchdb is the place where work comes together, in >>> our case >>> this is the location where a group of people shares. combining >>> information >>> from different databases will be necessary. and i really have no >>> clue yet >>> how to approach this problem. so anyone? >>> >> >> The easiest thing is to merge the databases with replication. >> >> >>> today i found myself in a sort of discussion with jchris and jan >>> (i am sorry >>> for the other jchris' and jans, but everyone knows who i mean.) >>> guys, what i >>> mean to say is that i am happy with your work. but your work is >>> very very >>> important to me. i think my work along with all the work of your >>> users is >>> what is going to make this movement great. if you help us succeed, >>> you will >>> have what you want. >>> >> >> If you're interested we'll be hosting a CouchDB tutorial in London >> next month: http://erlang-factory.com/conference/London2009/university/CouchDB >> >> 'scuse the plug :) >> >> >>> (the reason i sent it to both lists is that i think this 'couchdb >>> way' of >>> working is something that is not the problem of use OR >>> development. it is >>> necessary to make everyone work together and find out where >>> couchdb's future >>> lies.) >>> >>> groet, >>> jurg. >>> >>> >> >> >> >> >