incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Would there be a problem with storing documents with this structure?
Date Mon, 31 Aug 2009 21:39:04 GMT
On Mon, Aug 31, 2009 at 12:57 PM, Dale Ragan<dale.ragan@sinesignal.com> wrote:
> Jesse Hallett wrote:
>>
>> On Sun, Aug 30, 2009 at 5:21 PM, Chris Anderson<jchris@apache.org>  wrote:
>>
>>>
>>> On Sun, Aug 30, 2009 at 5:10 PM, Tom Sante<tom.sante@gmail.com>  wrote:
>>>
>>>>
>>>> On Sun, Aug 30, 19:11, Dale Ragan wrote:
>>>>
>>>>>>>
>>>>>>> Basically I have a document, with an id, rev, type, and Content
>>>>>>> keys.  The Content key
>>>>>>> holds the serialized object that is to be stored for it's value.
>>>>>>> Are there any pitfalls
>>>>>>> with this design?  I have attached a sample below:
>>>>>>>
>>>>>>
>>>>>> I should say I'm in no way an expert, I'm starting to wrap my head
>>>>>> around document modelling myself. I've been reading up on couchdb
>>>>>> a couple of days now and find it really interesting.
>>>>>>
>>>>>> Anyway, on to your document. First, why duplicate the manager id?
>>>>>> Isn't there a risk of them getting out of sync?
>>>>>>
>>>>>
>>>>> There is no chance that the Id's will get out of sync. I handle
>>>>> generating the Id's when the object is persisted for the first time.
>>>>>
>>>>>>
>>>>>> I think you will run into many conflicts if subordinates are
>>>>>> updated independently. Each subordinate has an id, is there
>>>>>> another document with more information about subordinates? In that
>>>>>> case, why not have all information in there and connect them with
>>>>>> a managerId attribute instead?
>>>>>>
>>>>>
>>>>> This is just an example object that I modeled up for the post.
>>>>> Subordinates in this case are updated another way.  They are just
>>>>> referenced by the Manager object.  Basically, a one-to-many
>>>>> relationship.  If you wanted to update one, you would use a document
>>>>> that wrapped the Worker object.  Is it better to normalize the data
>>>>> even in CouchDB?
>>>>>
>>>>> I am new to CouchDB also.  I am trying to abstract any need for a
>>>>> domain model needing to know about CouchDB's terms, like Rev.  I am
>>>>> writing an API in a statically typed language and I am experimenting
>>>>> with the best way to store the object that is given to my API.  This
>>>>> design helps and is one of the few I have come up with.
>>>>>
>>
>> Putting serialized data inside a 'Content' attribute is a good way to
>> go.  I have seen the same pattern recommended elsewhere.  It lets you
>> serialize arbitrary data without having collisions with metadata;
>> specifically the '_id', '_rev', and 'type' attributes.  And map
>> functions can pull any indexable data out of nested attributes, so I
>> don't think this approach has any particular performance implications.
>>
>
> Thanks, I think I might settle on this approach then, unless there's
> objections,  like if there would be
> problems with validation?
>

No objections here. The only reason I can think of not to keep data in
an application-specific nested subfield is the slight added complexity
of having to reference it in all your application code. This
complexity may be more than worth the tradeoff if collisions with
other top-level field names is at all a potential source of trouble.

>>
>>>>>>>
>>>>>>> {
>>>>>>>  "|_id|":|"000144df-6f11-49f1-a502-e0dab3592326"|,
>>>>>>>  "|_rev|":|"1-308931e16105b566e1fb48106c85116e"|,
>>>>>>>  "|type|":|"Manager"|,
>>>>>>>  "|Content|": {
>>>>>>>      "|Subordinates|": [
>>>>>>>          {
>>>>>>>              "|Address|": {
>>>>>>>                  "|Street|":|"123 Somewhere St."|,
>>>>>>>                  "|City|":|"Kalamazoo"|,
>>>>>>>                  "|State|":|"MI"|,
>>>>>>>                  "|Zip|":|"12345"|
>>>>>>>              },
>>>>>>>              "|Hours|":|40|,
>>>>>>>              "|Id|":|"6bcdea2f-2439-4785-ab59-2ee612435705"|,
>>>>>>>              "|Name|":|"Bob"|,
>>>>>>>              "|Login|":|"bbob"|
>>>>>>>          },
>>>>>>>          {
>>>>>>>              "|Address|": {
>>>>>>>                  "|Street|":|"123 Somewhere St."|,
>>>>>>>                  "|City|":|"Kalamazoo"|,
>>>>>>>                  "|State|":|"MI"|,
>>>>>>>                  "|Zip|":|"12345"|
>>>>>>>              },
>>>>>>>              "|Hours|":|40|,
>>>>>>>              "|Id|":|"b0d156c9-ea3f-4c4f-b49d-ab19bff64dd8"|,
>>>>>>>              "|Name|":|"Alice"|,
>>>>>>>              "|Login|":|"aalice"|
>>>>>>>          },
>>>>>>>          {
>>>>>>>              "|Address|": {
>>>>>>>                  "|Street|":|"123 Somewhere St."|,
>>>>>>>                  "|City|":|"Kalamazoo"|,
>>>>>>>                  "|State|":|"MI"|,
>>>>>>>                  "|Zip|":|"12345"|
>>>>>>>              },
>>>>>>>              "|Hours|":|20|,
>>>>>>>              "|Id|":|"12b6dbbc-44e8-43c2-8142-11fc6c1d23df"|,
>>>>>>>              "|Name|":|"Eve"|,
>>>>>>>              "|Login|":|"eeve"|
>>>>>>>          }
>>>>>>>      ],
>>>>>>>      "|Id|":|"000144df-6f11-49f1-a502-e0dab3592326"|,
>>>>>>>      "|Name|":|"6"|,
>>>>>>>      "|Login|":|"6-login"|
>>>>>>>  }
>>>>>>> }
>>>>>>>
>>>>>>> Basically the content is a Manager type object with an Id, Name,
>>>>>>> Login, and Subordinates.
>>>>>>> Subordinates are Worker's with an Id, Name, Login, Hours, and
an
>>>>>>> Address.  The _id and the Id of
>>>>>>> the Manager object are the same.  Basically the Document object
>>>>>>> is just a wrapper around what is
>>>>>>> given to be persisted.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Dale
>>>>>>>
>>>>
>>>> Like Martin said why all this duplication?
>>>> Give each worker it's own document and only add the id's of the
>>>> workers as subordinates. So you can change worker details without
>>>> having to change the manager document.
>>>>
>>>
>>> if you put the manager_id on the worker, then you can pull out a
>>> manager and all it's workers in a single query if you like, using just
>>> a map view.
>>>
>>> here's the canonical write up of the technique:
>>>
>>> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>>>
>>>
>>>>
>>>> It might even be better to only store the managers own info in the
>>>> manager doc and save any worker-manager relations in the respective
>>>> worker document by referencing the manager id in the worker doc + how
>>>> many hours he worked for that manager.
>>>> This makes it easier if a worker changes to work for another manager you
>>>> just reference the manager id in worker doc still keeping the history
>>>> of previous other managers that worker had in the past.
>>>>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message