Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 56AA7E66F for ; Thu, 17 Jan 2013 12:08:36 +0000 (UTC) Received: (qmail 60772 invoked by uid 500); 17 Jan 2013 12:08:34 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 60469 invoked by uid 500); 17 Jan 2013 12:08:29 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 60457 invoked by uid 99); 17 Jan 2013 12:08:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 12:08:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of timchankins@gmail.com designates 74.125.83.49 as permitted sender) Received: from [74.125.83.49] (HELO mail-ee0-f49.google.com) (74.125.83.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 12:08:21 +0000 Received: by mail-ee0-f49.google.com with SMTP id d4so1154530eek.22 for ; Thu, 17 Jan 2013 04:08:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=5neMDNks0CLwTKVEeeYQwZJGuErHRYT+QLyAnWSq0Ak=; b=WOzhgUVs1onP/pNAgDPv6tRzcUJdB0YEDo9etpFWw5IhsakM0AK8k1lsJCjU7QFm07 pqAeCCGZ80l5knm4f4DaOA7Oo+9/zIfF2z6r/lDlTtcj4gsNZ2hn8zKqLQLgq1Bevu64 lcjNKCfBCA0Z9WJZmDYI6EeYXxcco9LJt9bpW3QEI1FiZsTNU09k1QgFMU4u7i8fwnN7 5PGohXbx54ouDKHGbdJuMpfhcWya4A9nLwbvO1/iF+HMWoNLtBy+RE0v0FjFZFtPgXNc FBXh9EilgRvpqXqxsa9dxLEtooFMHv7KbGMWHFx4RZhgSbgSAHqcpxod9rPCjBBRzMgF s2kg== MIME-Version: 1.0 X-Received: by 10.14.206.197 with SMTP id l45mr13488786eeo.17.1358424481100; Thu, 17 Jan 2013 04:08:01 -0800 (PST) Received: by 10.14.176.198 with HTTP; Thu, 17 Jan 2013 04:08:01 -0800 (PST) Date: Thu, 17 Jan 2013 13:08:01 +0100 Message-ID: Subject: Refactoring a CouchDB. From: Tim Hankins To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=047d7b34413ca6752804d37ad8d1 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b34413ca6752804d37ad8d1 Content-Type: text/plain; charset=ISO-8859-1 Hi, I'm a student programmer at the IT University of Copenhagen, and have inherited a CouchDB application, which I'm having trouble scaling. I believe it may need to be refactored. Specifically, the problem seems to be coming from the use of Filtered Replication. (All user documents are stored in the same database, and replicating from server to client requires filtered replication.) I'm in the process of reading Chapter 23 of "O'Reilly: CouchDB - The Definitive Guide" which deals with High Performance, and "O'Reilly: Scaling CouchDB". Any other suggestions about the following would be greatly appreciated! Some background... The system is part of a clinical trial undertaken by the ITU and the Danish State Hospital. It aims to help Bipolar patients manage their disease. It is composed of 1). 100+ android phones running a client application and Couchbase Mobile. 2). A web server backed by CouchDB. Each day, the android client application collects two kinds of data. Subjective and Objective. Subjective data are manually entered by patients. Objective data are gathered from the phone's sensors. Subjective and Objective data are stored in their own couch documents, and have IDs that include the user's ID, the document type, and the date in a "DD_MM_YYYY" format. They are replicated once a day by placing replication docs in the "_replicator" database. Once replicated to the server, these documents are... 1). Used as input to a data mining algorithm. 2). Displayed on a web page. (Users can see their own data, and clinicians can see the data for all users.) The data mining algorithm produces a new CouchDB document for each user every day, which we call an "Impact Factor" document. (It looks at each user's historical objective and subjective data, and looks for correlations.) Replication: Replication takes place from client to server, and from server to client. 1). Client to server: This seems to be working fine. 2). Server to client: This is what's broken. Two things have to be replicated from server to client. 1). Each user's subjective data for the past 14 days. 2). Each user's Impact Factor document for the current day. Since all user documents are stored in the same database, we use filtered replication to send the right docs to the right users. The problem is that this filter function takes too long. ( >10minutes) 1). To test whether the filter function is crashing, I replicated the entire DB to another un-loaded machine, and it seems to run just fine. (Well it takes about 2.5 minutes, but it doesn't crash.) 2). I've tried re-writing the filter function in ERLANG, but haven't managed to get it working. And besides, I suspect that the way the DB is structured is just not suited to the job. So, to summarize... - Android client phones produce new CouchDB docs and replicate them to the server. - One central CouchDB holds all users. - Both individual and group data are served to web pages. - A data mining algorithm processes this data on a per-user basis. - Subjective data and Impact Factor data documents are replicated from the server to each client phone. Is there a way to structure the DB so that users can replicate without the need for filters, but which preserves the ability of clinicians to see an overview of all users? (It's my understanding that views can't be run * across* databases.) Well, as before, any suggestions or pointers would be much appreciated. Cheers, Tim. --047d7b34413ca6752804d37ad8d1--