Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 77873 invoked from network); 21 Mar 2008 21:55:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Mar 2008 21:55:57 -0000 Received: (qmail 43486 invoked by uid 500); 21 Mar 2008 21:55:56 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 43457 invoked by uid 500); 21 Mar 2008 21:55:55 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 43448 invoked by uid 99); 21 Mar 2008 21:55:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Mar 2008 14:55:55 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates 72.14.220.159 as permitted sender) Received: from [72.14.220.159] (HELO fg-out-1718.google.com) (72.14.220.159) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Mar 2008 21:55:16 +0000 Received: by fg-out-1718.google.com with SMTP id 22so1488680fge.26 for ; Fri, 21 Mar 2008 14:55:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; bh=v6iaL5agyQd6w0/pJyR4rZLX9K9MDsSbiFIuxC7TKw8=; b=XG1Ph7l2ILifkeG3YxOOgbjIduCCghNnApJQzT4bMPVCTyETYWxClCb3sot2GHz4//4ypC7otk1+Yqcnjp87UR7b3Q2v8ocsXd/OANVdEvcR9GN77A1EKzX6VzJ7/MvYeHPcCGeThXTqjXowrMdp6p9xt9nNljknxH9rMOUs7B0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=AZk3/nbiNay2H9mV8HPNN63TSbQMAxyk5WEoEt7KcbAMDXYIHHi9OG7ZyKsIY3iYZRfy4RLU8ZH3TYZwzIJDA7iyOdVJ0cPcCrShEmPVu6b4qK+46Q1b+FMbR9HQvpzrjHenPZhJ1zmKOmv23iD5p40Bh+0qmVVHZz20auwW1LY= Received: by 10.86.54.3 with SMTP id c3mr2140069fga.73.1206136526053; Fri, 21 Mar 2008 14:55:26 -0700 (PDT) Received: by 10.86.4.8 with HTTP; Fri, 21 Mar 2008 14:55:26 -0700 (PDT) Message-ID: Date: Fri, 21 Mar 2008 14:55:26 -0700 From: "Chris Anderson" Sender: jchris@gmail.com To: couchdb-user@incubator.apache.org Subject: Sphinx integration (was: Working on Lucene) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 35e63d0716e58984 X-Virus-Checked: Checked by ClamAV on apache.org On Fri, Mar 21, 2008 at 1:34 PM, Jan Lehnardt wrote: > Thanks for the input. This is actually an implementation detail of > the Indexer, but I agree that this should be supported. I also think > we should have some standard way here so other search solutions > can be plugged in without breaking things. > Jan, Some thoughts about Sphinx integration. The HTTP API as it currently stands (just the ability to page through an entire view) is sufficient to implement Sphinx indexing on views as an external process. However, Sphinx has the requirement that the documents it indexes each have a unique, numerical id. Using the CouchDB document ID would not be advised in that case. Using a map function the emits once per document (or using Reduce/Combine when it becomes available) coupled with a function to deterministically convert CouchDB document ids into integers should make for views which can be easily indexed by Sphinx. The map function might look like this function(doc) { if (doc.title) { map(docIDtoInteger(doc.id), doc.title); } } It's too bad that Sphinx doesn't support arbitrary strings as document IDs, but I'm sure there are plenty of reversible string-to-integer mappings that could be used. In that case Sphinx would be queried and return a list of matching integers IDs, which could be mapped back to CouchDB document IDs, and then retrieved from the Couch. This thought experiment is encouraging because it shows that even without integration into CouchDB, some very useful custom full-text indexes could be created. AFAIK Sphinx's support for updating indexes is limited to merging new documents into the index, so it would have little use for an API to find view-rows which have been changed or removed. Luckily, index rebuild is lightning fast. -- Chris Anderson http://jchris.mfdz.com