Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 54608 invoked from network); 17 Nov 2010 18:00:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Nov 2010 18:00:29 -0000 Received: (qmail 75527 invoked by uid 500); 17 Nov 2010 18:00:57 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 75499 invoked by uid 500); 17 Nov 2010 18:00:57 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 75491 invoked by uid 99); 17 Nov 2010 18:00:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Nov 2010 18:00:57 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcdcu-couchdb-user@m.gmane.org designates 80.91.229.12 as permitted sender) Received: from [80.91.229.12] (HELO lo.gmane.org) (80.91.229.12) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Nov 2010 18:00:48 +0000 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PImJ1-0002y7-Fv for user@couchdb.apache.org; Wed, 17 Nov 2010 19:00:27 +0100 Received: from 77.107.84.74 ([77.107.84.74]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 17 Nov 2010 19:00:27 +0100 Received: from nicolas.jessus by 77.107.84.74 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 17 Nov 2010 19:00:27 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: user@couchdb.apache.org From: Nicolas Jessus Subject: Re: Forcing document reindex Date: Wed, 17 Nov 2010 18:00:00 +0000 (UTC) Lines: 64 Message-ID: References: <76A109FD-9829-4EAA-9BA1-0FAC29357EA9@apache.org> <7D7C2F35-4630-494D-BD39-C446FCB3486E@apache.org> <4CE40F8A.10106@aol.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 77.107.84.74 (Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.2.10) Gecko/20100915 Ubuntu/9.10 (karmic) Firefox/3.6.10) X-Virus-Checked: Checked by ClamAV on apache.org Hello Cliff, > I am not sure if I fully understand your use case (however it does sound > intriguing and unusual). Sorry, I'll try to be clearer. I should have taken a real case to start with, I just didn't want to be necessarily verbose (failed!). Consider 5 types of documents: type: Meeting _id: M1 meetingProposalID: MP1 date: 2010-09-09 type: MeetingProposal _id: MP1 projectPartID: PP1 date: 2010-10-10 type: ProjectPart _id: PP1 projectID: P1 type: Project _id: P1 clientID: C1 type: Client _id: C1 name: John ProjectPart can be denormalised into Project, but let's ignore that. Let's say I would like to know the average time between a meeting proposal and the actual meeting, per client, to see what kind of delay I should expect. This is a simple report, others are much more complex, so I'm really looking to solve the general case problem. Naively, the key should be something like [clientName, dateMP1, dateM1], or maybe [clientName] and a value of [dateMP1, dateM1]. There can be hundreds of thousands of meetings. The problem is to generate the key triplet when there's no common ID between the documents. > I assume that you are getting data out of your legacy MySQL system using > complex joins.?? Yes, although the joins aren't complex, the data model is pretty straightforward, with docs mostly in a chain. > Have you considered totally denormalising your data and input data to > couchdb based on the output of your MySQL reports ?? Yes, but that would not really work - each document can still be updated on its own, with maybe a few thousand updates a day, which is little but enough to cause massive locks if there are massively denormalised documents. > Perhaps couchdb-lucene (or my current fav of the moment elasticsearch > which is also based on lucene) would be useful ?? I already have set it up, and modified it to make simple doc joins without fuss, which is good enough for run-of-the-mill searching. It wouldn't resolve the million-doc-pull problem, though, and joining is obviously pretty slow. But thanks for the proposals :)