couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Jessus <>
Subject Re: Forcing document reindex
Date Wed, 17 Nov 2010 18:00:00 GMT
Hello Cliff,

> I am not sure if I fully understand your use case (however it does sound 
> intriguing and unusual).

Sorry, I'll try to be clearer. I should have taken a real case to start with, I
just didn't want to be necessarily verbose (failed!). 

Consider 5 types of documents:

type: Meeting
_id: M1
meetingProposalID: MP1
date: 2010-09-09

type: MeetingProposal
_id: MP1
projectPartID: PP1
date: 2010-10-10

type: ProjectPart
_id: PP1
projectID: P1

type: Project
_id: P1
clientID: C1

type: Client
_id: C1
name: John

ProjectPart can be denormalised into Project, but let's ignore that.

Let's say I would like to know the average time between a meeting proposal and
the actual meeting, per client, to see what kind of delay I should expect. This 
is a simple report, others are much more complex, so I'm really looking to solve
the general case problem.

Naively, the key should be something like [clientName, dateMP1, dateM1], or
maybe [clientName] and a value of [dateMP1, dateM1]. There can be hundreds of
thousands of meetings. The problem is to generate the key triplet when there's
no common ID between the documents.

> I assume that you are getting data out of your legacy MySQL system using 
> complex joins.??
Yes, although the joins aren't complex, the data model is pretty
straightforward, with docs mostly in a chain. 

> Have you considered totally denormalising your data and input data to 
> couchdb based on the output of your MySQL reports ??
Yes, but that would not really work - each document can still be updated on its
own, with maybe a few thousand updates a day, which is little but enough to
cause massive locks if there are massively denormalised documents.

> Perhaps couchdb-lucene (or my current fav of the moment elasticsearch 
> which is also based on lucene) would be useful ??
I already have set it up, and modified it to make simple doc joins without fuss,
which is good enough for run-of-the-mill searching. It wouldn't resolve the
million-doc-pull problem, though, and joining is obviously pretty slow.

But thanks for the proposals :)

View raw message