couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: General Q about CouchDB
Date Wed, 17 Dec 2008 19:43:54 GMT
On Wed, Dec 17, 2008 at 2:18 PM, James Marca
<jmarca@translab.its.uci.edu> wrote:
> On Wed, Dec 17, 2008 at 01:47:47PM -0500, Paul Davis wrote:
>> James,
>>
>> Near as I know, it's not possible to do an arbitrary tree depth first
>> tree sort only using a parent link. Someone may yet come up with a
>> clever trick to do it, but for the moment no one has thought of a
>> solution.
>>
>> You mention storing the entire path but also seem to discard the idea.
>> Any reason. If you're in a threaded thinger like you got, then you
>> would have access to the path of the node to which your replying. And
>> the topology would be stable so no worries about moving paths etc.
>> Having the full path should allow you to do what you want fairly
>> easily. I think.
>
> Two reasons I am discounting storing the full path to the parent in
> each node.  First, if a node only knows its parent, then the parent
> can move around and not put the node's data into conflict with the
> parent's.  If on the other hand the node knows its parent's path, then
> if you move any node in the hierarchy, you have to fix all the
> descendants' information as well.  I am beginning to understand that
> couchdb doesn't enforce consistent data (thanks mostly to the nifty
> figure 2.1 in the draft book), so I'd rather not put stuff in there
> that expects consistency and then needs to be maintained.  Seems like
> asking for trouble.
>

These are all valid points, but in terms of comments on a blog, how
often are you going to re-parent nodes in the tree? Assuming you're
not a fan of revisionist history, I'd guess never. Or even if you
decide to go back and reengineer the document relations it would be in
terms of writing scripts to transform the entire db at once.

> Second, and more importantly to my sense of propriety as a programmer,
> if every node underneath a parent stores the parent's path to the
> root, that seems like a waste of resources.  I'd rather skip the
> sorting step altogether and do something else (like the rely on the
> client to build the tree from well structured data, as in my prior
> post).
>

Part of knowing the rules is knowing when to break them. While there
would be overhead in terms of disk space, you're saving your self a
ton of computation. This is one of the core tenants of the CouchDB
philosophy.

> But, on the other hand, I *would* like to send a single query that can
> fetch all comments under a specific comment.  The start key/end key
> hack with arrays as keys only seems to work if every doc can generate
> an array with the same first element, second element, etc etc.  I keep
> thinking there might be a way to write out arrays in reverse order, or
> maybe only keep the depth as a parameter inside of a doc and fill out
> empty values or 'Z' for everything preceding depth -2 and depth-1 in
> the sort array.  But both seem to be dead ends.
>

It sure seems like it could almost be done, but the more I look at it
the more that I think it could actually be prooven that it's
impossible. The closest I've seen is getting a proper sort in N
queries where N is the maximum depth of the tree. The logic is that to
get a proper 1 request sort, each node needs to know where in the sort
it needs to be. And AFAICT, this would be impossible without
information on the full path to that node. This isn't to say that
there might be some nifty method for storing that path information in
constant space though.

> Recursive queries are probably the only way to go, or maybe storing
> the root post as well as the immediate parent in each "comment" type
> doc, so that you can get all comments under a doc (which I want), and
> take a rough stab at an initial sorting of any nested comments---the
> jQuery type of solution I wrote earlier fails if you try to append a
> node to a parent that doesn't yet exist.
>

The only issue with this though is that it doesn't work with paging.
Whether that's a concern or not I don't know.

> James
>

I've been dealing with a similar issue in regards to threading email
list archives. I spent a bit of time reading up on different threading
algorithms until I realized the answer to my question. There is no
spoon.

Thinking about it, the best two email UI's I've ever used are Gmail
and MarkMail. Neither of which uses a hierarchical view. They each
have single linear threads. that are arranged by time of arrival.
Something in that tells me that the threading issue is really a
'insert euphemism that means non-issue'. I wouldn't doubt that there's
a white paper out there that says as much.

HTH,
Paul Davis

>>
>> Paul Davis
>>
>> On Wed, Dec 17, 2008 at 1:30 PM, James Marca
>> <jmarca@translab.its.uci.edu> wrote:
>> > On Wed, Dec 17, 2008 at 01:49:20AM +0100, Jan Lehnardt wrote:
>> >>
>> >> On 16 Dec 2008, at 20:54, Christopher McComas wrote:
>> >>
>> >> >Chris,
>> >> >Thanks. One question, concern I might have with that would be just
>> >> >spelling something differently, but that shouldn't be too big of an
>> >> >issue.
>> >> >
>> >> >To my next question, what would be the best way to structure
>> >> >comments for a blog post, where they have their own author,
>> >> >timestamp, and entry?  Again, this is fairly straight-forward with
>> >> >a relational db using a foreign key.
>> >>
>> >> Same concept ;)
>> >>
>> >> See http://www.cmlenz.net/archives/2007/10/couchdb-joins for details.
>> >>
>> >
>> > Apologies for forking a topic slightly, but this maps onto a problem I
>> > am having.  And apologies if this has been answered.  I'm new here, I
>> > *did* look, but I haven't a solution I like yet.
>> >
>> > The article's suggested solution will allow comments nested one-layer
>> > deep.  Am I missing something, or is it nearly impossible to collect
>> > comments on comments in one go?  My thought would be to replace "post"
>> > with "parent", but then the view map can't build the sort order
>> > properly, no?
>> >
>> > For example:
>> >
>> > {
>> >  "_id": "ABCDEF",
>> >  "_rev": "123456",
>> >  "type": "comment",
>> >  "post": "myslug",
>> >  "author": "jack",
>> >  "content": "…"}
>> > }, {
>> >  "_id": "DEFABC",
>> >  "_rev": "123456",
>> >  "type": "comment",
>> >  "post": "myslug",
>> >  "parent": "myslug",
>> >  "author": "jane",
>> >  "content": "…"
>> > }, {
>> >  "_id": "FABC1234",
>> >  "_rev": "123456",
>> >  "type": "comment",
>> >  "post": "myslug",
>> >  "parent": "DEFABC",
>> >  "author": "john",
>> >  "content": "…"
>> > }
>> >
>> > Winging it with untested code, the best guess I can make for nested
>> > sorting is something like:
>> >
>> > function(doc) {
>> >  if (doc.type == "post") {
>> >    emit([doc._id, 0], doc);
>> >  } else if (doc.type == "comment") {
>> >    if(doc.parent == null || doc.parent=doc.post){
>> >         // could have a date here for the second sort key?
>> >         emit([doc.post, doc._id, 1], doc);
>> >    }else{
>> >         // this fails for arbitrarily deep nesting.
>> >         emit([doc.post,doc.parent,doc._id],doc);
>> >    }
>> >  }
>> > }
>> >
>> > As I understand it, the problem is that without storing the complete
>> > hierarchy of comments, you can't reproduce the correct nested sorting
>> > in one go.  To quote the "how to store hierarchical data" page in the
>> > wiki, "Store the full path to each node as an attribute in that node's
>> > document."
>> >
>> > On the other hand, a perfectly valid solution that uses client-side
>> > javascript to build the doc (this is a blog after all) would be to
>> > just use dom functions to append to parents, something like
>> >
>> > jQuery.each(commentArray, function(){
>> >        jQuery("#"+this.parent)
>> >         .append("<div id='"+this._id+"'class='comment'>"
>> >                 +this.content
>> >                 +"</div>");
>> > });
>> >
>> > While this makes it possible to nest comments on the page of
>> > most browswers that support jQuery etc., my real question is about the
>> > inner workings of couchdb, whether it is possible to make the sort
>> > with some clever view definition trickery.
>> >
>> > Note that I have absolutely zero clue about reduce functions and their
>> > uses.  Maybe you can use reduce to generate arbitrarily deep nesting
>> > of comments with just a "parent" field??
>> >
>> > James
>> >
>> >> Cheers
>> >> Jan
>> >> --
>> >>
>> >>
>> >> >
>> >> >
>> >> >Thanks,
>> >> >
>> >> >On Tue, Dec 16, 2008 at 2:51 PM, Chris Anderson <jchris@gmail.com>
>> >> >wrote:
>> >> >
>> >> >>On Tue, Dec 16, 2008 at 11:46 AM, Christopher McComas
>> >> >><mccomas.chris@gmail.com> wrote:
>> >> >>>Would it be wrong to try to do the category piece as related
in
>> >> >>>CouchDB?
>> >> >>>What would be the best way to do it, so that you can have a
page,
>> >> >>>myblog.com/categories/this-category/ that'd then display all
the
>> >> >>>entries
>> >> >>for
>> >> >>>that category? What would be proper?
>> >> >>
>> >> >>Having a category field on the blog post itself is a fine way to
do
>> >> >>this.
>> >> >>
>> >> >>Eg:
>> >> >>
>> >> >>{
>> >> >>"title":"Blah",
>> >> >>"author":"Chris",
>> >> >>"category":"music",
>> >> >>"date": ...
>> >> >>}
>> >> >>
>> >> >>Writing a view that sorts posts by category and date would be simple
>> >> >>with this sort of data structure. Of course if you wanted to rename
a
>> >> >>category later you'd need to touch all the documents that listed
it,
>> >> >>so this solution is more like tagging than categories, but should
>> >> >>fulfill the need.
>> >> >>
>> >> >>
>> >> >>--
>> >> >>Chris Anderson
>> >> >>http://jchris.mfdz.com
>> >> >>
>> >
>> > --
>> > This message has been scanned for viruses and
>> > dangerous content by MailScanner, and is
>> > believed to be clean.
>> >
>> >
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>

Mime
View raw message