couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: Defining my document model when the source is entity-relationship
Date Fri, 10 Jul 2009 11:15:05 GMT
I think it might be useful to go with your first instincts, that a  
book is a document, and then see how to best organize the information  
you have about the book in support of some application, rather than  
trying to model an existing relational schema that was designed to  
solve a perhaps different problem (organizing books in a library).  
Maybe one doesn't care that the book "Morbus rules" is only one copy  
produced by a publisher. Maybe the domain of interest is the chemical  
make-up of paper and what attracts undisciplined dogs to piss on books.

As you read in the first chapter, CouchDB is about schema-less  
document databases. Schema evolution in relational models is the  
source of many problems. In many contexts given by web applications  
information is a lot more dynamic, often incomplete and inaccurate. If  
you're in a domain such as banking or libraries, existing relational  
schemas are great. Your earlier comment about folksonomies versus  
controlled vocabularies called to mind an excellent essay [1] by Clay  
Shirky. One of the main examples in the essay is library catalogs and  
you may find it useful for thinking about the problem.

Cheers,

Bob



[1] http://www.shirky.com/writings/ontology_overrated.html




On Jul 9, 2009, at 2:12 PM, Morbus Iff wrote:

>
> Hello!
>
> I know nothing about CouchDB (woohoo!)
>
> You can all blame nslater for this mail (boo! hiss!)
>
> I don't really have a huge interest in learning CouchDB - it's more  
> of a
> passing "huh", based solely cos nslater keeps talking about the damn
> thing on IRC all the time. But, I'd figure that if I'm going to rib  
> him
> about working on some crazy new technology, I might as well base my
> flaming and puerile hatred on actual facts and usage, yeah? ;)
>
> So, to satisfy said passing "huh", my pet project will be to implement
> FRBR within CouchDB. FRBR is a librarian tech which basically models a
> way to talk about works of creation. It's relatively new in the scheme
> of things (the librarian world moves a lot slower than the internet
> world). The design of FRBR, however, is mostly based around the idea  
> of
> relational databases, which is exactly what CouchDB purportedly isn't.
>
> Right.
>
> I've already, four or five years ago, taken the FRBR spec and  
> converted
> it into a set of MySQL relational tables. The earliest thing I can do
> with CouchDB, however, is thinking about how FRBR fits into the
> "Self-Contained Data" model of /relax/why-couchdb.
>
> To quote from WP:Functional_Requirements_for_Bibliographic_Records:
>
>   Group 1 entities are Work, Expression, Manifestation, and Item  
> (WEMI).
>   They represent the products of intellectual or artistic endeavour.
>
>   Group 2 entities are person and corporate body, responsible for
>   the custodianship of Group 1’s intellectual or artistic endeavour.
>
>   Group 3 entities are subjects of Group 1 or Group 2’s intellectual
>   endeavour, and include concepts, objects, events, places.
>
> There are some swank charts on WP showing this model.
>
> http://en.wikipedia.org/wiki/File:FRBR-Group-1-entities-and-basic-relations.svg
> http://en.wikipedia.org/wiki/File:FRBR-Group-2-entities-and-relations.svg
>
> The simplest question, really is:
>
>  * Should all these Group entities be individual docs...
>  * ... or should they all be a single document inside CouchDB?
>
> Perhaps I should start out with an example of what FRBR is and isn't.
>
> You own a book called "Morbus Rules". It's signed by Morbus, but your
> dog took a piss on it, so the bottom half is slightly stained. At  
> first,
> you'd say to yourself, well, "hey! that's a document! why, it's just
> like the business card or address book analogy we love to use!"
>
> Right. It is.
>
> But, in FRBR, that simple book is a lot more complex. That simple book
> is an "Item" (your personal copy) of a "Manifestation" (all other  
> books
> that are the same printing from the same publisher) of an "Expression"
> (all versions of this book that share the exact same creative parts)  
> of
> a "Work" (the theoretical hand-waving artistic/creative "feeling",  
> which
> could be expressed as a book, a musical, an interactive DVD, etc.). It
> has various "People" and "Companies" involved (that could change from
> Work or Expression or Item - i.e., you are the Person:Owner of this
> Item, but the Person:Author is always the same of any of these
> Expressions). Concepts, locations, and other tag-like thingies also
> apply to this Manifestation (and potentially, to the Item itself, like
> "dog pissed on it" or the more polite "used").
>
> /me coughs. You, in the back, wake up!
>
> Should FRBR Group 1 entities (the combined mega-Thing of Work,
> Expression, Manifestation, and Item; "WEMI") be a single document  
> within
> CouchDB? Or should they each be their own document which somehow  
> relates
> to all the others?
>
> Things like tags and identifiers (this books ISSN, DOI, ISBN, UPC,  
> etc.)
> I can easily see as being part of the Self-Contained Data of a  
> document.
> But I'm not sure if there should only be one JSON document called
> "Work", with it containing all the other major pieces of a creative
> endeavor, or if it should be four major documents (W, E, M, I) with
> relations to each other.
>
> I don't expect you to understand FRBR, fully I'm just trying to fit a
> design that was specifically made /for/ relationship databases into
> something that was specifically made /not for/ the relational  
> approach.
>
> -- 
> Morbus Iff ( tomorrow never comes until it's too late )
> Technical: http://www.oreillynet.com/pub/au/779
> Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/
> aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
>
>
>


Mime
View raw message