couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Schick <schi...@gmail.com>
Subject Re: Modifying fields
Date Thu, 12 Jun 2008 23:40:09 GMT
Thanks for the feedback.


On 06/12/2008 03:21 PM, Jan Lehnardt wrote:
>> Follow up questions on this: Does CouchDB internally track and reference
>> individual fields? Or is the json for each document basically a blob to
>> everything except View code?
>
> Documents are stored into native Erlang types representing each
> document. Except for the view server, no-one cares about what
> a document look like.
So the DB just treats each document like a string I assume? I was hoping
it actually understood the fields. If it doesn't know about fields, then
I understand that it might not be that much more efficient doing things
on the server.

But I'm curious; if the Erlang code doesn't look inside documents why do
I get errors if I pass just a json array as the body of a document? It
seems to require a json object with named pairs.

>> (caveat: I know little about CouchDB internals, so the following is
>> based on how I assume it might work)
>>
>> To complement Views, how about a concept of Modifier scripts? These
>> would work in two separate stages. First, a map stage would build an
>> index similar to Views. If CouchDB is able to reference individual
>> fields, the map would emit a key and field names for each document. If
>> CouchDB is only able to reference documents, the map would emit just a
>> key for each document. Then there would be a 'modify' stage that was run
>> when the modifier's URI was POSTed to. The modify function would accept
>> arbitrary JSON from the PUT, key(s), and either individual fields (if
>> possible) or  entire document(s) (if not). I'd assume the modify
>> function would have to be called either once per key or with blocks of
>> keys to avoid holding everything in memory at once.
>
> Why not do just post-modify on the client with caching? I don't really
> see
> the need to add that to the DB server. Note also, that the map output
> can optionally be reduced (and rereduced) which allow further
> computations.
I can't cache all of the document on all of the front-end servers (aka
clients). At any given time most of them won't be in a cache. In my idea
above, I don't think reduce would apply since the goal is just finding
the docs to update rather than computing a result.

If the documents are opaque to Erlang I understand the savings aren't
that great. But along with bandwidth it would also save the CPU overhead
of sending and receiving data (I believe I read somewhere that HTTP
processing in the DB is a significant component of total CPU use).

My data-model is still a work in progress, so perhaps I won't really
need to update lots documents in sequence. Mostly I've been thinking of
maintenance examples. Like, every value for some textfield needs to be
escaped and written back to the DB after there are already a million
documents containing that textfield in the DB.

It would be interesting to know the load on the DB of doing something
like that inside the server versus sending and receiving all million
documents to the client.


-Brad

Mime
View raw message