Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Date: Wed, 17 Dec 2008 13:20:18 -0800
From: James Marca <jmarca@translab.its.uci.edu>
To: user@couchdb.apache.org
Subject: Re: General Q about CouchDB
Message-ID: <20081217212018.GG5560@translab.its.uci.edu>
Mail-Followup-To: user@couchdb.apache.org
References: <389be9770812161146ldcbd435l32300db81573ebd0@mail.gmail.com>
 <e282921e0812161151y40751b9fi2acf02374a202a24@mail.gmail.com>
 <389be9770812161154r507c57b5ja6bfe298071d9731@mail.gmail.com>
 <F966D10F-78E0-43BB-BE02-BCEF6A2367B9@prima.de>
 <20081217183043.GA5560@translab.its.uci.edu>
 <e2111bbb0812171047l1fab40cbmbe99e02864e904fb@mail.gmail.com>
 <20081217191804.GC5560@translab.its.uci.edu>
 <e2111bbb0812171143rfac687ejae1f0b2be705c4d1@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <e2111bbb0812171143rfac687ejae1f0b2be705c4d1@mail.gmail.com>
User-Agent: Mutt/1.4.1i

On Wed, Dec 17, 2008 at 02:43:54PM -0500, Paul Davis wrote:
> On Wed, Dec 17, 2008 at 2:18 PM, James Marca
> <jmarca@translab.its.uci.edu> wrote:
> > On Wed, Dec 17, 2008 at 01:47:47PM -0500, Paul Davis wrote:
> >> James,
> >>
> >> Near as I know, it's not possible to do an arbitrary tree depth first
> >> tree sort only using a parent link. Someone may yet come up with a
> >> clever trick to do it, but for the moment no one has thought of a
> >> solution.
> >>
> >> You mention storing the entire path but also seem to discard the idea.
> >> Any reason. If you're in a threaded thinger like you got, then you
> >> would have access to the path of the node to which your replying. And
> >> the topology would be stable so no worries about moving paths etc.
> >> Having the full path should allow you to do what you want fairly
> >> easily. I think.
> >
> > Two reasons I am discounting storing the full path to the parent in
> > each node.  First, if a node only knows its parent, then the parent
> > can move around and not put the node's data into conflict with the
> > parent's.  If on the other hand the node knows its parent's path, then
> > if you move any node in the hierarchy, you have to fix all the
> > descendants' information as well.  I am beginning to understand that
> > couchdb doesn't enforce consistent data (thanks mostly to the nifty
> > figure 2.1 in the draft book), so I'd rather not put stuff in there
> > that expects consistency and then needs to be maintained.  Seems like
> > asking for trouble.
> >
> 
> These are all valid points, but in terms of comments on a blog, how
> often are you going to re-parent nodes in the tree? Assuming you're
> not a fan of revisionist history, I'd guess never. Or even if you
> decide to go back and reengineer the document relations it would be in
> terms of writing scripts to transform the entire db at once.

Yes, well, my application isn't comments on a blog.  I was trying not
to move the original topic too far!

> 
> > Second, and more importantly to my sense of propriety as a programmer,
> > if every node underneath a parent stores the parent's path to the
> > root, that seems like a waste of resources.  I'd rather skip the
> > sorting step altogether and do something else (like the rely on the
> > client to build the tree from well structured data, as in my prior
> > post).
> >
> 
> Part of knowing the rules is knowing when to break them. While there
> would be overhead in terms of disk space, you're saving your self a
> ton of computation. This is one of the core tenants of the CouchDB
> philosophy.
> 

In my app, my bottleneck *is* disk space.  I'm writing new
computed results from raw data every 30 seconds from now to forever,
so of course I have to also roll along and delete older stuff, and
also allow for recreation of older stuff on demand if needed from the
original source data.  If each node carries redundant data, I can
carry just a little bit less ancient history.  

All the writing and deleting happens server side.  The general use
clients only ask for data (that may or may not exist).  Each data
point is independent of all others so couchdb is probably optimal (no
need for db-enforced consistency, and I gain high availability).  

But each data source does exist in a physical, real world hierarchy.
That hierarchy doesn't change all that much, and I can hard code the
searches I use most, but I'm *also* trying to grok the limits of
couchdb while I build this app.

> > But, on the other hand, I *would* like to send a single query that can
> > fetch all comments under a specific comment.  The start key/end key
> > hack with arrays as keys only seems to work if every doc can generate
> > an array with the same first element, second element, etc etc.  I keep
> > thinking there might be a way to write out arrays in reverse order, or
> > maybe only keep the depth as a parameter inside of a doc and fill out
> > empty values or 'Z' for everything preceding depth -2 and depth-1 in
> > the sort array.  But both seem to be dead ends.
> >
> 
> It sure seems like it could almost be done, but the more I look at it
> the more that I think it could actually be prooven that it's
> impossible. The closest I've seen is getting a proper sort in N
> queries where N is the maximum depth of the tree. The logic is that to

probably N-1, but I agree. It seems like a 1 request sort would need
N-1 information already written, which perhaps is why the hack in the
blog page with just one level of depth (parent) works in one query.

> get a proper 1 request sort, each node needs to know where in the sort
> it needs to be. And AFAICT, this would be impossible without
> information on the full path to that node. This isn't to say that
> there might be some nifty method for storing that path information in
> constant space though.
> 
> > Recursive queries are probably the only way to go, or maybe storing
> > the root post as well as the immediate parent in each "comment" type
> > doc, so that you can get all comments under a doc (which I want), and
> > take a rough stab at an initial sorting of any nested comments---the
> > jQuery type of solution I wrote earlier fails if you try to append a
> > node to a parent that doesn't yet exist.
> >
> 
> The only issue with this though is that it doesn't work with paging.
> Whether that's a concern or not I don't know.
> 

Not a concern of mine for my app.  If I was doing a blog or otherwise
worried about paging long lists of data, I'd probably look at a
client-side data engine, like dojo's data stores.  The initial pull
from the db would populate the data store, then the doc to doc
organization would be done within the store.  while I've never used
them, I recall seeing a paging type of function in dojo's data
docs. Hmm, things have changed in dojo.land, but

http://docs.dojocampus.org/quickstart/data/usingdatastores/pagination

seems relevant.

My understanding of couchdb after writing this and thinking about the
docs I've read is that the fundamental design principle is that
documents are supposed to be independent.  Otherwise, database
consistency becomes important.  If documents rely only on local
information, all is well.  If you bend or break this rule, you're
asking for trouble down the road and/or must plan for and write code
to clean up the messes that will eventually result.  My app doesn't
*really* need join tables or data from other documents (for sorting,
etc), so the upshot for me is that I'm probably going to change my
current RDBMS-centric way of doing things.

> 
> I've been dealing with a similar issue in regards to threading email
> list archives. I spent a bit of time reading up on different threading
> algorithms until I realized the answer to my question. There is no
> spoon.
> 
> Thinking about it, the best two email UI's I've ever used are Gmail
> and MarkMail. Neither of which uses a hierarchical view. They each
> have single linear threads. that are arranged by time of arrival.
> Something in that tells me that the threading issue is really a
> 'insert euphemism that means non-issue'. I wouldn't doubt that there's
> a white paper out there that says as much.
> 
well this is neither here nor there, but I use mutt, which I've set to
sort on threads and (secondary) reverse-date-received.  Mostly that does
what I want.

Thanks for the comments
James

> HTH,
> Paul Davis
> 
> >>
> >> Paul Davis
> >>
> >> On Wed, Dec 17, 2008 at 1:30 PM, James Marca
> >> <jmarca@translab.its.uci.edu> wrote:
> >> > On Wed, Dec 17, 2008 at 01:49:20AM +0100, Jan Lehnardt wrote:
> >> >>
> >> >> On 16 Dec 2008, at 20:54, Christopher McComas wrote:
> >> >>
> >> >> >Chris,
> >> >> >Thanks. One question, concern I might have with that would be just
> >> >> >spelling something differently, but that shouldn't be too big of an
> >> >> >issue.
> >> >> >
> >> >> >To my next question, what would be the best way to structure
> >> >> >comments for a blog post, where they have their own author,
> >> >> >timestamp, and entry?  Again, this is fairly straight-forward with
> >> >> >a relational db using a foreign key.
> >> >>
> >> >> Same concept ;)
> >> >>
> >> >> See http://www.cmlenz.net/archives/2007/10/couchdb-joins for details.
> >> >>
> >> >
> >> > Apologies for forking a topic slightly, but this maps onto a problem I
> >> > am having.  And apologies if this has been answered.  I'm new here, I
> >> > *did* look, but I haven't a solution I like yet.
> >> >
> >> > The article's suggested solution will allow comments nested one-layer
> >> > deep.  Am I missing something, or is it nearly impossible to collect
> >> > comments on comments in one go?  My thought would be to replace "post"
> >> > with "parent", but then the view map can't build the sort order
> >> > properly, no?
> >> >
> >> > For example:
> >> >
> >> > {
> >> >  "_id": "ABCDEF",
> >> >  "_rev": "123456",
> >> >  "type": "comment",
> >> >  "post": "myslug",
> >> >  "author": "jack",
> >> >  "content": "…"}
> >> > }, {
> >> >  "_id": "DEFABC",
> >> >  "_rev": "123456",
> >> >  "type": "comment",
> >> >  "post": "myslug",
> >> >  "parent": "myslug",
> >> >  "author": "jane",
> >> >  "content": "…"
> >> > }, {
> >> >  "_id": "FABC1234",
> >> >  "_rev": "123456",
> >> >  "type": "comment",
> >> >  "post": "myslug",
> >> >  "parent": "DEFABC",
> >> >  "author": "john",
> >> >  "content": "…"
> >> > }
> >> >
> >> > Winging it with untested code, the best guess I can make for nested
> >> > sorting is something like:
> >> >
> >> > function(doc) {
> >> >  if (doc.type == "post") {
> >> >    emit([doc._id, 0], doc);
> >> >  } else if (doc.type == "comment") {
> >> >    if(doc.parent == null || doc.parent=doc.post){
> >> >         // could have a date here for the second sort key?
> >> >         emit([doc.post, doc._id, 1], doc);
> >> >    }else{
> >> >         // this fails for arbitrarily deep nesting.
> >> >         emit([doc.post,doc.parent,doc._id],doc);
> >> >    }
> >> >  }
> >> > }
> >> >
> >> > As I understand it, the problem is that without storing the complete
> >> > hierarchy of comments, you can't reproduce the correct nested sorting
> >> > in one go.  To quote the "how to store hierarchical data" page in the
> >> > wiki, "Store the full path to each node as an attribute in that node's
> >> > document."
> >> >
> >> > On the other hand, a perfectly valid solution that uses client-side
> >> > javascript to build the doc (this is a blog after all) would be to
> >> > just use dom functions to append to parents, something like
> >> >
> >> > jQuery.each(commentArray, function(){
> >> >        jQuery("#"+this.parent)
> >> >         .append("<div id='"+this._id+"'class='comment'>"
> >> >                 +this.content
> >> >                 +"</div>");
> >> > });
> >> >
> >> > While this makes it possible to nest comments on the page of
> >> > most browswers that support jQuery etc., my real question is about the
> >> > inner workings of couchdb, whether it is possible to make the sort
> >> > with some clever view definition trickery.
> >> >
> >> > Note that I have absolutely zero clue about reduce functions and their
> >> > uses.  Maybe you can use reduce to generate arbitrarily deep nesting
> >> > of comments with just a "parent" field??
> >> >
> >> > James
> >> >
> >> >> Cheers
> >> >> Jan
> >> >> --
> >> >>
> >> >>
> >> >> >
> >> >> >
> >> >> >Thanks,
> >> >> >
> >> >> >On Tue, Dec 16, 2008 at 2:51 PM, Chris Anderson <jchris@gmail.com>
> >> >> >wrote:
> >> >> >
> >> >> >>On Tue, Dec 16, 2008 at 11:46 AM, Christopher McComas
> >> >> >><mccomas.chris@gmail.com> wrote:
> >> >> >>>Would it be wrong to try to do the category piece as related in
> >> >> >>>CouchDB?
> >> >> >>>What would be the best way to do it, so that you can have a page,
> >> >> >>>myblog.com/categories/this-category/ that'd then display all the
> >> >> >>>entries
> >> >> >>for
> >> >> >>>that category? What would be proper?
> >> >> >>
> >> >> >>Having a category field on the blog post itself is a fine way to do
> >> >> >>this.
> >> >> >>
> >> >> >>Eg:
> >> >> >>
> >> >> >>{
> >> >> >>"title":"Blah",
> >> >> >>"author":"Chris",
> >> >> >>"category":"music",
> >> >> >>"date": ...
> >> >> >>}
> >> >> >>
> >> >> >>Writing a view that sorts posts by category and date would be simple
> >> >> >>with this sort of data structure. Of course if you wanted to rename a
> >> >> >>category later you'd need to touch all the documents that listed it,
> >> >> >>so this solution is more like tagging than categories, but should
> >> >> >>fulfill the need.
> >> >> >>
> >> >> >>
> >> >> >>--
> >> >> >>Chris Anderson
> >> >> >>http://jchris.mfdz.com
> >> >> >>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.