incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Copenhaver <sean.copenha...@gmail.com>
Subject Re: Database design question
Date Wed, 28 Dec 2011 01:49:11 GMT
Hmm.. I'll take a stab at giving you some guidelines and suggestions. I hope it helps but I'll
warn you I haven't done anything with CouchDB in awhile and I'm not testing my map/reduce
and queries. Keep that in mind.  

Something that raised an eyebrow is that you are trying to model your document (known for
being non-relational) data similar to your tabular relational model data.

So something that you do with relational databases is normalize your data to try and store
it only once. You then have to join the data back together piecemeal to get what you want.
Document databases are usually naturally denormalized in comparison and you have shallow and
few relationships. The documents can tend to represent the core ideas for your application
and may match more to how you work with the data.

Naturally the trade off is storage costs, update costs for the duplicated data, but could
provide simpler and easy to work with data model.

Another thing to keep in mind is that defining a map function (as in the map part of a view)
is defining the index you want to do range queries on. A map function is not a query, but
a clustered index on the key/value pair which point back to the original document (you could
say they act as a non-clustered index in that manner). From your e-mail I wasn't sure if that
was understood.

With that said, I would suggest storing the gallery information inside the artwork document.
This duplicates the data but I'm guessing you would rarely modify the galleries information
and more likely add new galleries or maybe switch out the gallery info for an artwork document.
You could do something similar with the artist information considering that this might not
change very often as well. Also keep in mind that you don't have to use GUIDs for your keys.
If the artist's full name or something similar provides the uniqueness you could use that.

Perhaps the artist and gallery documents stay the same but the artwork document looks like:

{
    "type": "artwork",
    "title": "Guernica",
    "medium": "Oil on canvas",
    "dimensions": "349 cm x 776 cm (137.4 in x 305.5 in)",
    "artist": "Pablo Picasso",
    "birth_year": 1881,
    "artist_id": "<some guid>",
    "gallery_id": "<some guid>",
    "gallery": "Museo Reina Sofia"
}

That's just a possibility. I wouldn't store all the info in the artwork document but just
the major bits you know are less likely to change. Storing a document like this means you
can retrieve all the relevant pieces of information in one query against a map function defined
as:

function (doc) {
    if (doc.type === 'artwork') {
        emit(doc._id, null);
    }
}

Notice that I left out the document as a value for the emit() call. This is because the document's
id is always stored with the key/value pair. Emitting the document for the value is a trade
off of retrieve speed versus storage/update costs.  

Anyway you can then query this and ask CouchDB to retrieve the documents. Something like this
with paging:

    http://server/db/_design/art/_view/artwork?start_key="<guid>"&limit=101&include_docs=true

You could even not have a gallery document if you didn't need any extra info with them. You
could then do a map/reduce on the artwork documents to get the list of galleries your data
represents.

"galleries" : {
    "map": function (doc) {
        if (doc.type === 'artwork') {
            emit(doc.gallery, 1);
        }
    },

    "reduce": "_sum"
}

This example query would retrieve the galleries and the number of artworks for each:

    http://server/db/_design/art/_view/galleries?group=true

Assuming that the name is unique enough to specify the gallery that is. If you did need to
add extra data you could give the gallery document an id of it's name to link the two.

As I said before you want shallow and few relationships but you can retrieve them together
if you are trying to go the other direction of the relationship. In this situation if you
wanted to retrieve an artist with all their artworks.

A map function such as:

function (doc) {
    if (doc.type === 'artist') {
        emit([doc._id, 0], null);
    }
    else if (doc.type === 'artwork') {
        emit([doc.artist_id, 1], null);
    }
}

would allow queries to retrieve an artist and all their artworks. You just have to know that
the 0 is the artist and the 1 is the artwork in the results. I use those numbers to simply
order the artist first. A query on the view with the artist's guid would allow you to retrieve
all the artworks as well.

    http://server/db/_design/art/_view/artist_and_works?key="[<guid>]"&include_docs=true

Ok, dang. I may have gone off on a tangent but I hope this helps with your CouchDB understanding
and data modeling. Here are some links as well that may help.

Relationships in CouchDB:
https://wiki.apache.org/couchdb/EntityRelationship

Views intro:
https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

View query options:
https://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options

How your keys sort the data in a view:
https://wiki.apache.org/couchdb/View_collation

Remember you can retrieve multiple ids at once:
https://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Fetch_Multiple_Documents_With_a_Single_Request

The questions many don't think to ask, incomplete but a couple with good info:
https://wiki.apache.org/couchdb/FUQ

Also I swear MongoDB's doc's had a decent page on document database data modeling but I can't
seem to dig it up right now.

If there are any mistakes I hope others with chime in.  

--  
Sean Copenhaver



On Tuesday, December 27, 2011 at 4:31 PM, Steven Ringo wrote:

> Hi all,
>  
> (I originally posted this on the mobile-couchbase list, but was  
> recommended to post it here instead, since this is more of a general  
> couchdb question)
>  
> I am struggling with something that I know would be quite trivial to do  
> in SQL, but seems to be a lot more complex to do in couchdb. Perhaps 20  
> years of experience with SQL is clouding my judgement and so I am trying  
> rather to break away from the SQL/relational mindset and understand how  
> this would be better designed in couch.
>  
> I am developing an iOS art gallery app, and wish to display information  
> about artworks, artists and galleries.
>  
> The menu screen requires a list of all artworks together with the  
> gallery they are currently housed at and the artist that created the  
> artwork:
>  
> * artwork title
> * artwork medium
> * artwork dimensions
> * artist full_name
> * gallery name,
> * gallery city
>  
> for each artwork, respectively.
>  
> Imagine the following objects in my database (I am not including all the  
> fields for sake of brevity. i.e. only the ones that are important).
>  
> | Artist
> ------
> _id
> full_name
> birth_year
> birth_place
>  
> Gallery
> -------
> _id
> name
> city
>  
> Artwork
> -------
> _id
> title
> medium
> dimensions
> artist_id_ (foreign key)
> gallery_id_ (foreign key)
> |
>  
> In SQL it would be as easy as one query with two joins:
>  
> |SELECT
> artwork.title,
> artwork.medium,
> artwork.dimensions,
> artist.full_name,
> gallery.name (http://gallery.name),
> gallery.city
> FROM
> artwork INNER JOIN artist
> ON artwork.artist_id_ = artist._id INNER JOIN gallery
> ON artwork.gallery_id_ = gallery._id
> |
>  
> From this I would be able to get the data I need in one query (i.e. one  
> call to the database).
>  
> | Guernica | Pablo Picasso | Museo Reina Sofia, Madrid, Spain
> Mona Lisa | Leonardo da Vinci | Musée du Louvre, Paris, France
> David | Michelangelo | Uffizi Gallery, Florence, Italy.
> |
>  
> Using couchdb I am storing the data in a very similar way to the way I  
> would in a relational database, represented as something along the lines of:
>  
> |{
> "type" : "Artwork",
> "_id" : "d665afaedde12b44c5159cf1782c5c1e",
> "_rev" : "1-446201af8a1ddfa3759164a01be899de",
> "artist_id" : "d665afaedde12b44c5159cf1782c2d0b",
> "gallery_id" : "d665afaedde12b44c5159cf1782d44af",
> "title" : "Guernica",
> "medium" : "Oil on canvas",
> "dimensions" : "349 cm × 776 cm (137.4 in × 305.5 in)"
> }
>  
> {
> "type" : "Artist",
> "_id" : "d665afaedde12b44c5159cf1782c2d0b",
> "_rev" : "1-8fa8a6dd50d9d9072b08db1a840128b1",
> "full_name" : "Pablo Picasso",
> "birth_year" : "1881"
> }
>  
> {
> "type" : "Gallery",
> "_id" : "d665afaedde12b44c5159cf1782d44af",
> "_rev" : "1-27a139e40a7f88d6465812eec3c73a0f",
> "name" : "Museo Reina Sofia"
> }
> |
>  
> To get the same result in couch, I would have to do one query to get all  
> the artworks, e.g.
>  
> |function(doc) {
> if(doc.type&& doc.type == 'Artist') {
> emit(doc._id,doc);
> }
> }
> |
>  
> then I would need to do a |GET| for the _id of each |Gallery| and  
> |Artist| returned per |Artwork| respectively to fill in the missing  
> information. This is usually known as the |n+1| queries problem (in this  
> case |2n+1|).
>  
> I have always worked on the assumption that the database should be  
> leveraged to do as much of the heavy lifting as possible with as few  
> network hits as possible (i.e. SQL is always faster than client  
> manipulation). With a list of 120 artworks, this means 241 HTTP GETs to  
> the database.
>  
> I am aware of "linked documents" in views  
> (http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Linked_documents),  
> however this seems to only work with one related row.
>  
> I can't really see how "view collation"  
> (http://wiki.apache.org/couchdb/View_collation) would work here either.  
> With view collation the correlated information ends up being on  
> different rows, and I think that would be better suited for where say a  
> single Artists or Gallery is displayed with associated Artworks. My case  
> is the opposite, whereby Artworks plus corresponding information from  
> Artists and Galleries is required.
>  
> I am also aware that one could embed each gallery or artist in each  
> artwork or vice versa, but that seems to be a lot of duplication, and  
> would make it difficult to change artist or gallery information if  
> required. Maybe this is how it should be done?
>  
> For this small app, this is not going to be a performance issue. I am  
> trying rather to break away from a SQL/relational mindset and understand  
> how this would be better designed in a database like couch.
>  
> Thanks,
>  
> Steve  


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message