incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From svilen ...@svilendobrev.com>
Subject Re: Refactoring a CouchDB.
Date Thu, 17 Jan 2013 12:28:17 GMT
you could have additional intermediate database per user, replicate it
whole to from phone, and filter-replicate it to/from whole database. 
(thus the filtered repl. will be on server. )
then eventualy u could swap the roles of which database is MAIN/leading
- the whole one or the pieces.

this might be the least possible change/impact fix.
ciao
svilen

 On Thu, 17 Jan 2013 13:08:01 +0100
Tim Hankins <timchankins@gmail.com> wrote:

> Hi,
> 
> I'm a student programmer at the IT University of Copenhagen, and have
> inherited a CouchDB application, which I'm having trouble scaling. I
> believe it may need to be refactored.
> 
> Specifically, the problem seems to be coming from the use of Filtered
> Replication. (All user documents are stored in the same database, and
> replicating from server to client requires filtered replication.)
> 
> I'm in the process of reading Chapter 23 of "O'Reilly: CouchDB - The
> Definitive Guide" which deals with High Performance, and "O'Reilly:
> Scaling CouchDB". Any other suggestions about the following would be
> greatly appreciated!
> 
> Some background...
> 
> The system is part of a clinical trial undertaken by the ITU and the
> Danish State Hospital. It aims to help Bipolar patients manage their
> disease. It is composed of
>     1). 100+ android phones running a client application and Couchbase
> Mobile.
>     2). A web server backed by CouchDB.
> 
> Each day, the android client application collects two kinds of data.
> Subjective and Objective. Subjective data are manually entered by
> patients. Objective data are gathered from the phone's sensors.
> 
> Subjective and Objective data are stored in their own couch
> documents, and have IDs that include the user's ID, the document
> type, and the date in a "DD_MM_YYYY" format. They are replicated once
> a day by placing replication docs in the "_replicator" database.
> 
> Once replicated to the server, these documents are...
>     1). Used as input to a data mining algorithm.
>     2). Displayed on a web page. (Users can see their own data, and
> clinicians can see the data for all users.)
> 
> The data mining algorithm produces a new CouchDB document for each
> user every day, which we call an "Impact Factor" document. (It looks
> at each user's historical objective and subjective data, and looks for
> correlations.)
> 
> Replication: Replication takes place from client to server, and from
> server to client.
>     1). Client to server: This seems to be working fine.
>     2). Server to client: This is what's broken.
> 
> Two things have to be replicated from server to client.
>     1). Each user's subjective data for the past 14 days.
>     2). Each user's Impact Factor document for the current day.
> 
> Since all user documents are stored in the same database, we use
> filtered replication to send the right docs to the right users.
> 
> The problem is that this filter function takes too long. ( >10minutes)
>     1). To test whether the filter function is crashing, I replicated
> the entire DB to another un-loaded machine, and it seems to run just
> fine. (Well it takes about 2.5 minutes, but it doesn't crash.)
>     2). I've tried re-writing the filter function in ERLANG, but
> haven't managed to get it working.
> 
> And besides, I suspect that the way the DB is structured is just not
> suited to the job.
> 
> So, to summarize...
>     - Android client phones produce new CouchDB docs and replicate
> them to the server.
>     - One central CouchDB holds all users.
>     - Both individual and group data are served to web pages.
>     - A data mining algorithm processes this data on a per-user basis.
>     - Subjective data and Impact Factor data documents are replicated
> from the server to each client phone.
> 
> Is there a way to structure the DB so that users can replicate
> without the need for filters, but which preserves the ability of
> clinicians to see an overview of all users? (It's my understanding
> that views can't be run * across* databases.)
> 
> Well, as before, any suggestions or pointers would be much
> appreciated.
> 
> Cheers,
> Tim.

Mime
View raw message