couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Kimber <mkim...@kana.com>
Subject RE: Am I doing something fundamentally wrong?
Date Fri, 25 May 2012 19:36:50 GMT
Ladislav,

For someone who doesn't really understand your points make a lot of sense to me :-)

As you say unless we write our MR in erlang its probably not a great fit for our document
size. But providing we don't have to rebuild them to often it does make for a good document
store and ETL mechanism for feeding a OLAP engine.

Thanks this and Matthieu's responses help greatly.

Mike 

-----Original Message-----
From: Ladislav Thon [mailto:ladicek@gmail.com] 
Sent: 25 May 2012 16:17
To: user@couchdb.apache.org
Subject: Re: Am I doing something fundamentally wrong?

>
> Clearly I seem to be a bit of a loan voice on this as everyone skirts
> around the why do views take so long to build, why do they only run on one
> CPU and why do they take up so much space


I don't really understand CouchDB, so I'm not afraid to (try to) answer
this and be (quite possibly) wrong :-)

1. "why do views take so long to build" -- because for every document, a
JavaScript function has to be executed. This function is executed by a
separate process (view server), in this case the "couchjs" process. As your
documents are fairly large, this might incur significant
serialization/deserialization overhead. Views can also be written in
Erlang, AFAIK, and I bet that would be a hell of a lot faster.

2. "why do they only run on one CPU" -- because order of processing still
matters here (to make incremental view updates possible), even if it is
called "map reduce". It would surely be possible to write a parallel
implementation, but it's more tricky than it looks on the first sight.
Noone just did it yet.

3. " why do they take up so much space" -- because of CouchDB's append-only
B-tree nature. Your views seem to have pretty random keys (as the keys are
IDs of your documents), which means that a lot of inner nodes have to be
created only to be discarded few moments later. Note that discarded here
means that no pointer points at them, but they are still lying on the disk.

All in all, I'd say that CouchDB isn't exactly great fit for an OLAP-style
workload.

LT

Mime
View raw message