incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Kimber <mkim...@kana.com>
Subject RE: when couchdb is not right for my use case
Date Thu, 10 May 2012 13:23:21 GMT
Bryan,

I have a similar situation. As you are probably seeing when you do a read it takes a while
for the map reduce view to catchup (even a basic 2 x NVP emit with no reduce; document sizes
range from (100k to 7.5MB) and we have 90K of them). I've got round this by using "?stale=update_after"
parameter, but this is not my real issue. I want to discover useful information from our logs
i.e. i don’t know what  need from the logs yet, however the feedback loop for doing this
discovery seems to be very long i.e. review document, write java script, build view (on a
very very small and un representative data set ie. 10 documents) and repeat. This takes a
long time even before I have to apply it to the real data set which then takes 6 hours to
build the view. So in summary:

1. Takes way too long to re-build and update views
2. Feedback loop is long for information discovery
3. High write low read on largish documents means a high query latency (map reduce catchup)

As you say there's a lot to like about Couchdb (schema less, replication , JavaScript Query,
incremental map reduce, RESTful API, its just plain simple), pus all my data is in it now!
So what I'm currently looking to do is use Couch as a message/document store  but add the
following on top of it:

1. couchdb-lucene to; speed up information discovery
2. Luciddb; to pull my course grained view data created from the discovery phase in to luciddb
to enable ad-hoc querying

No idea if this helps you or not, but you're not alone :-)

Mike 

-----Original Message-----
From: bryan rasmussen [mailto:rasmussen.bryan@gmail.com] 
Sent: 10 May 2012 08:46
To: user@couchdb.apache.org
Subject: when couchdb is not right for my use case

Hi,

I really like working with couchdb, one of the benefits it gives at
the beginning of a project is the ability to play with data, to
determine the right data structure that one actually needs (since I'm
an XML guy this is pretty important to me[I also think couchdb does
this much better than XQuery based DBs - too strongly typed])

So anyway, because I like couchdb I have embarked on a apache/solr
logs analysis project for which couchdb does not seem to be
well-suited (which I knew beforehand but was using couchdb as quick
proof of concept for some of the things I wanted to do.)

the drawbacks are:

logs pile up quickly, so the project is write intensive. Since the
data is being used internally for reports it is not likely to be read
intensive.
Should not need any revision management.
A lot of the benefits of db replication will not be useful.
Lots of views to data need to be provided.

So has anyone ever had a similar situation, and what did you move to
as your DB. Or how did you structure you couchdb solution to make it
more suitable?

Thanks,
Bryan Rasmussen
Mime
View raw message