couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <g...@pobox.com>
Subject Re: newbie question #1
Date Sun, 28 Dec 2008 14:15:58 GMT

On Dec 28, 2008, at 9:00 AM, Paul Davis wrote:

> On Sun, Dec 28, 2008 at 8:47 AM, Geir Magnusson Jr. <geir@pobox.com>  
> wrote:
>>
>> On Dec 28, 2008, at 8:26 AM, Paul Davis wrote:
>>>
>>> You're pretty much spot on here. "id" and "key" both refer to the
>>> "_id" field in a document. And the "rev" does indeed refer to the
>>> "_rev" attribute. Why "id" and "rev" are used instead of "_id" and
>>> "_rev" I couldn't really tell you. I hate to say "historical  
>>> reasons"
>>> but I'm guessing that when Damien designed the view output he just
>>> labeled then "id" and "rev" without the underscore because it's not
>>> needed to distinguish from the rest of the doc.
>>
>> Ok, cool.  So... can key be something else?  Or should I assume  
>> that "key"
>> is a synonym for "_id"?
>>
>
> Its a bit misleading because you chose _all_docs as the first view you
> looked at. Really _all_docs is a special internal view that CouchDB
> provides. When you get to defining your own views, you learn that
> views are created by emit'ing key/value pairs that are arbitrary JSON
> objects (no _id/_rev complaints even). So yes, key can be whatever you
> want when defining a custom view.

I read the view docs (and have other questions there, like if the M/R  
is distributed across a cluster - I've used M/R w/ Hadoop, so I come  
w/ a set of assumptions...)  and I saw that it doesn't *appear* that  
the key or id is injected in the view doc, which of course brings up  
an obvious question :)

>
>
>> [SNIP]
>>
>>>> {
>>>>  _id : whatever
>>>>  _rev : whatever
>>>>  doc : { ..... the full user document that can have _id, _rev and
>>>> whatever....}
>>>> }
>>>>
>>>>
>>>
>>> Like Noah says, reserving underscore prefixed fields as private to
>>> CouchDB doesn't make it not JSON. I'd argue that putting the  
>>> document
>>> stuff inside a doc member would probably be a annoyance in that  
>>> every
>>> operation on the doc would require doc.doc.foo instead of just  
>>> doc.foo
>>
>> I certainly understand that there are tradeoffs.  We do the same  
>> thing at
>> 10gen - modify the user's document for storage.  Some random  
>> thoughts :
>>
>> 1) doing an insert requires that the user document be deserialized  
>> (maybe
>> only partially?), the additional fields inserted, and then re- 
>> serialized for
>> storage.  Have a metadata envelope means that the user document  
>> keyspace and
>> the server's metadata keyspace are totally decoupled.
>>
>
> I fail to see how these two points are related, but at the moment
> partial de/serialization is not done in CouchDB. Its been discussed
> (extensively) and has been more or less put on hold until there is a
> JSON community supported diff format. Though, come to think of it,
> that'll still require a full de/serialization round trip.

You're right - it's not related from the POV of making it convenient  
to access fields w/o the extra reference hop.  I was just making a  
list of issues related to an envelope...

I'll go look at the dev archive to see if I can get a hint about what  
you are referring to.

>
>
>> 2) It prevents, or at least makes harder, any document security -  
>> any hash
>> function would have to account for the fact that there may be  
>> external keys
>> injected into the document ("_*").  This is doable, but now makes  
>> your code
>> - which was handling "generic JSON" - now have to know that it's  
>> working w/
>> a couchdb store....
>>
>
> I don't follow.

Suppose I wanted to ensure that my data isn't modified - I could  
produce a cryptographic signature of my JSON doc, add that to the doc,  
and then store it.  But when it comes back, it now has two magical  
fields added - _id and _rev - which I'd have to remove before re- 
calculating my hash.

That's doable of course, but if I had some generalized library for  
doing this, there would have to be special handling when a doc is  
stored in couchdb vs other places (written to disk, tattooed on a  
hamster, whatever...)

>
>
>> 3) the doc.doc.foo problem - Is that really a problem?  I haven't  
>> worked w/
>> couch yet to understand the common access patterns, but it seems  
>> that the
>> different calls to the rest API return things of different "shape"  
>> anyway...
>> if you are accessing by document id, you could just get the user  
>> doc back,
>> and it seems that other queries return metadata anyway (e.g.  
>> _all_docs) so
>> people must be used to pulling the user doc out of the framing  
>> data....  You
>> could solve the issue in MR easily as well.
>>
>
> Its not a *problem* it'd just annoy me to have to type doc.doc.foo
> instead of doc.foo.

Of course.  And I think that things that annoy me are problems :)

>
>
>> Anyway, I don't want this to distract :)  It's just a subject I'm  
>> interested
>> in, as it's a personal pet peeve...
>>
>> geir
>>
>>
>>>
>>>
>>> HTH,
>>> Paul Davis
>>
>>
>
> Apologies if I seem confused. I haven't been to sleep since a long  
> time ago.

All is well - thanks for the help.   I'll keep reading and playing.

geir

>
>
> HTH,
> Paul Davis


Mime
View raw message