incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Williams <cliffywi...@aol.com>
Subject Re: Forcing document reindex
Date Wed, 17 Nov 2010 17:23:22 GMT
Nicolas,

I am not sure if I fully understand your use case (however it does sound 
intriguing and unusual).

A couple of things stick out in your commentary;

"The data is only weakly relational."
"DB updates are relatively few"

I assume that you are getting data out of your legacy MySQL system using 
complex joins.??

Have you considered totally denormalising your data and input data to 
couchdb based on the output of your MySQL reports ??
Perhaps couchdb-lucene (or my current fav of the moment elasticsearch 
which is also based on lucene) would be useful ??

If none of the two suggestions are of any use. Could you post a more 
detailed description (with a data sample if possible) of

"The hiccup is reporting. Some of it involves the full set of documents. 
Let's
say I have 5 categories of documents involved in a report, A to E. A 
links to B,
B links to C, etc. The report needs data from A, B, and E. As far as I can
think, there's no way to do a view collation, because A and B share an 
ID but E
doesn't. I can't pull a million documents from the DB to process elsewhere
either, so that nixes simple indexing and the '_id' object values."


Very best regards

Cliff

On 17/11/10 16:13, Nicolas Jessus wrote:
> All right; no one should like what they're going to read.
>
> I have a medium-sized MySQL system, which translates to a Couch with about a
> million documents of about 20 types. The system would really benefit from a
> schema-free design. The data is only weakly relational. Couch would fit really
> well, enough that I don't mind twisting its arm in a few places if need be; the
> tradeoff would be worth it.
>
> The hiccup is reporting. Some of it involves the full set of documents. Let's
> say I have 5 categories of documents involved in a report, A to E. A links to B,
> B links to C, etc. The report needs data from A, B, and E. As far as I can
> think, there's no way to do a view collation, because A and B share an ID but E
> doesn't. I can't pull a million documents from the DB to process elsewhere
> either, so that nixes simple indexing and the '_id' object values.
>
> I could however write a special view_server that will emit keys after checking
> the linked ID through an HTTP call (that's where you scream). Indexing
> performance is totally unimportant to me, DB updates are relatively few, and I
> can live with the dirty side-effects (again, the system as a whole would still
> be much cleaner than the MySQL one).
>
> With that solution I can have a map function that just handle docs of type A.
> But I still need to reindex the relevant As when B or E changes. I could simply
> listen to the change stream and force a reindex, but that doesn't work well with
> legitimate updates when the _rev number goes up at random even though the doc
> hasn't changed, and there's no auto-merge. So I'm pretty stuck.
>
> I'm not asking that this type of functionality be encouraged. It's clearly
> subverting the point of Couch. On the other hand, it doesn't seem like having a
> force-reindex function would dirty the concept, and if it's easy to code, then
> it's a shame it doesn't exist.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message