incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Ringo <goo...@stevenringo.com>
Subject Database design question
Date Tue, 27 Dec 2011 21:31:27 GMT
Hi all,

(I originally posted this on the mobile-couchbase list, but was 
recommended to post it here instead, since this is more of a general 
couchdb question)

I am struggling with something that I know would be quite trivial to do 
in SQL, but seems to be a lot more complex to do in couchdb. Perhaps 20 
years of experience with SQL is clouding my judgement and so I am trying 
rather to break away from the SQL/relational mindset and understand how 
this would be better designed in couch.

I am developing an iOS art gallery app, and wish to display information 
about artworks, artists and galleries.

The menu screen requires a list of all artworks together with the 
gallery they are currently housed at and the artist that created the 
artwork:

  * artwork title
  * artwork medium
  * artwork dimensions
  * artist full_name
  * gallery name,
  * gallery city

for each artwork, respectively.

Imagine the following objects in my database (I am not including all the 
fields for sake of brevity. i.e. only the ones that are important).

|     Artist
     ------
     _id
     full_name
     birth_year
     birth_place

     Gallery
     -------
     _id
     name
     city

     Artwork
     -------
     _id
     title
     medium
     dimensions
     artist_id_ (foreign key)
     gallery_id_ (foreign key)
|

In SQL it would be as easy as one query with two joins:

|SELECT
     artwork.title,
     artwork.medium,
     artwork.dimensions,
     artist.full_name,
     gallery.name,
     gallery.city
FROM
     artwork INNER JOIN artist
     ON artwork.artist_id_ = artist._id INNER JOIN gallery
     ON artwork.gallery_id_ = gallery._id
|

 From this I would be able to get the data I need in one query (i.e. one 
call to the database).

|     Guernica  | Pablo Picasso     | Museo Reina Sofia, Madrid, Spain
     Mona Lisa | Leonardo da Vinci | Musée du Louvre, Paris, France
     David     | Michelangelo      | Uffizi Gallery, Florence, Italy.
|

Using couchdb I am storing the data in a very similar way to the way I 
would in a relational database, represented as something along the lines of:

|{
    "type"       : "Artwork",
    "_id"        : "d665afaedde12b44c5159cf1782c5c1e",
    "_rev"       : "1-446201af8a1ddfa3759164a01be899de",
    "artist_id"  : "d665afaedde12b44c5159cf1782c2d0b",
    "gallery_id" : "d665afaedde12b44c5159cf1782d44af",
    "title"      : "Guernica",
    "medium"     : "Oil on canvas",
    "dimensions" : "349 cm × 776 cm (137.4 in × 305.5 in)"
}

{
    "type"       : "Artist",
    "_id"        : "d665afaedde12b44c5159cf1782c2d0b",
    "_rev"       : "1-8fa8a6dd50d9d9072b08db1a840128b1",
    "full_name"  : "Pablo Picasso",
    "birth_year" : "1881"
}

{
    "type"       : "Gallery",
    "_id"        : "d665afaedde12b44c5159cf1782d44af",
    "_rev"       : "1-27a139e40a7f88d6465812eec3c73a0f",
    "name"       : "Museo Reina Sofia"
}
|

To get the same result in couch, I would have to do one query to get all 
the artworks, e.g.

|function(doc) {
   if(doc.type&&  doc.type == 'Artist') {
     emit(doc._id,doc);
   }
}
|

then I would need to do a |GET| for the _id of each |Gallery| and 
|Artist| returned per |Artwork| respectively to fill in the missing 
information. This is usually known as the |n+1| queries problem (in this 
case |2n+1|).

I have always worked on the assumption that the database should be 
leveraged to do as much of the heavy lifting as possible with as few 
network hits as possible (i.e. SQL is always faster than client 
manipulation). With a list of 120 artworks, this  means 241 HTTP GETs to 
the database.

I am aware of "linked documents" in views 
(http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Linked_documents), 
however this seems to only work with one related row.

I can't really see how "view collation" 
(http://wiki.apache.org/couchdb/View_collation) would work here either. 
With view collation the correlated information ends up being on 
different rows, and I think that would be better suited for where say a 
single Artists or Gallery is displayed with associated Artworks. My case 
is the opposite, whereby Artworks plus corresponding information from 
Artists and Galleries is required.

I am also aware that one could embed each gallery or artist in each 
artwork or vice versa, but that seems to be a lot of duplication, and 
would make it difficult to change artist or gallery information if 
required. Maybe this is how it should be done?

For this small app, this is not going to be a performance issue. I am 
trying rather to break away from a SQL/relational mindset and understand 
how this would be better designed in a database like couch.

Thanks,

Steve


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message