incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Hallett <halle...@gmail.com>
Subject Re: Would there be a problem with storing documents with this structure?
Date Mon, 31 Aug 2009 00:50:31 GMT
On Sun, Aug 30, 2009 at 5:21 PM, Chris Anderson<jchris@apache.org> wrote:
> On Sun, Aug 30, 2009 at 5:10 PM, Tom Sante<tom.sante@gmail.com> wrote:
>> On Sun, Aug 30, 19:11, Dale Ragan wrote:
>>> >
>>> >>Basically I have a document, with an id, rev, type, and Content
>>> >>keys.  The Content key
>>> >>holds the serialized object that is to be stored for it's value.
>>> >>Are there any pitfalls
>>> >>with this design?  I have attached a sample below:
>>> >I should say I'm in no way an expert, I'm starting to wrap my head
>>> >around document modelling myself. I've been reading up on couchdb
>>> >a couple of days now and find it really interesting.
>>> >
>>> >Anyway, on to your document. First, why duplicate the manager id?
>>> >Isn't there a risk of them getting out of sync?
>>> There is no chance that the Id's will get out of sync. I handle
>>> generating the Id's when the object is persisted for the first time.
>>> >
>>> >I think you will run into many conflicts if subordinates are
>>> >updated independently. Each subordinate has an id, is there
>>> >another document with more information about subordinates? In that
>>> >case, why not have all information in there and connect them with
>>> >a managerId attribute instead?
>>> This is just an example object that I modeled up for the post.
>>> Subordinates in this case are updated another way.  They are just
>>> referenced by the Manager object.  Basically, a one-to-many
>>> relationship.  If you wanted to update one, you would use a document
>>> that wrapped the Worker object.  Is it better to normalize the data
>>> even in CouchDB?
>>>
>>> I am new to CouchDB also.  I am trying to abstract any need for a
>>> domain model needing to know about CouchDB's terms, like Rev.  I am
>>> writing an API in a statically typed language and I am experimenting
>>> with the best way to store the object that is given to my API.  This
>>> design helps and is one of the few I have come up with.

Putting serialized data inside a 'Content' attribute is a good way to
go.  I have seen the same pattern recommended elsewhere.  It lets you
serialize arbitrary data without having collisions with metadata;
specifically the '_id', '_rev', and 'type' attributes.  And map
functions can pull any indexable data out of nested attributes, so I
don't think this approach has any particular performance implications.

>>> >>{
>>> >>  "|_id|":|"000144df-6f11-49f1-a502-e0dab3592326"|,
>>> >>  "|_rev|":|"1-308931e16105b566e1fb48106c85116e"|,
>>> >>  "|type|":|"Manager"|,
>>> >>  "|Content|": {
>>> >>      "|Subordinates|": [
>>> >>          {
>>> >>              "|Address|": {
>>> >>                  "|Street|":|"123 Somewhere St."|,
>>> >>                  "|City|":|"Kalamazoo"|,
>>> >>                  "|State|":|"MI"|,
>>> >>                  "|Zip|":|"12345"|
>>> >>              },
>>> >>              "|Hours|":|40|,
>>> >>              "|Id|":|"6bcdea2f-2439-4785-ab59-2ee612435705"|,
>>> >>              "|Name|":|"Bob"|,
>>> >>              "|Login|":|"bbob"|
>>> >>          },
>>> >>          {
>>> >>              "|Address|": {
>>> >>                  "|Street|":|"123 Somewhere St."|,
>>> >>                  "|City|":|"Kalamazoo"|,
>>> >>                  "|State|":|"MI"|,
>>> >>                  "|Zip|":|"12345"|
>>> >>              },
>>> >>              "|Hours|":|40|,
>>> >>              "|Id|":|"b0d156c9-ea3f-4c4f-b49d-ab19bff64dd8"|,
>>> >>              "|Name|":|"Alice"|,
>>> >>              "|Login|":|"aalice"|
>>> >>          },
>>> >>          {
>>> >>              "|Address|": {
>>> >>                  "|Street|":|"123 Somewhere St."|,
>>> >>                  "|City|":|"Kalamazoo"|,
>>> >>                  "|State|":|"MI"|,
>>> >>                  "|Zip|":|"12345"|
>>> >>              },
>>> >>              "|Hours|":|20|,
>>> >>              "|Id|":|"12b6dbbc-44e8-43c2-8142-11fc6c1d23df"|,
>>> >>              "|Name|":|"Eve"|,
>>> >>              "|Login|":|"eeve"|
>>> >>          }
>>> >>      ],
>>> >>      "|Id|":|"000144df-6f11-49f1-a502-e0dab3592326"|,
>>> >>      "|Name|":|"6"|,
>>> >>      "|Login|":|"6-login"|
>>> >>  }
>>> >>}
>>> >>
>>> >>Basically the content is a Manager type object with an Id, Name,
>>> >>Login, and Subordinates.
>>> >>Subordinates are Worker's with an Id, Name, Login, Hours, and an
>>> >>Address.  The _id and the Id of
>>> >>the Manager object are the same.  Basically the Document object
>>> >>is just a wrapper around what is
>>> >>given to be persisted.
>>> >>
>>> >>Thanks,
>>> >>
>>> >>Dale
>>
>> Like Martin said why all this duplication?
>> Give each worker it's own document and only add the id's of the
>> workers as subordinates. So you can change worker details without
>> having to change the manager document.
>
> if you put the manager_id on the worker, then you can pull out a
> manager and all it's workers in a single query if you like, using just
> a map view.
>
> here's the canonical write up of the technique:
>
> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>
>>
>> It might even be better to only store the managers own info in the
>> manager doc and save any worker-manager relations in the respective
>> worker document by referencing the manager id in the worker doc + how
>> many hours he worked for that manager.
>> This makes it easier if a worker changes to work for another manager you
>> just reference the manager id in worker doc still keeping the history
>> of previous other managers that worker had in the past.

Mime
View raw message