incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@mfdz.com>
Subject Sphinx integration (was: Working on Lucene)
Date Fri, 21 Mar 2008 21:55:26 GMT
On Fri, Mar 21, 2008 at 1:34 PM, Jan Lehnardt <jan@apache.org> wrote:
>  Thanks for the input. This is actually an implementation detail of
>  the Indexer, but I agree that this should be supported. I also think
>  we should have some standard way here so other search solutions
>  can be plugged in without breaking things.
>

Jan,

Some thoughts about Sphinx integration.

The HTTP API as it currently stands (just the ability to page through
an entire view) is sufficient to implement Sphinx indexing on views as
an external process.

However, Sphinx has the requirement that the documents it indexes each
have a unique, numerical id. Using the CouchDB document ID would not
be advised in that case. Using a map function the emits once per
document (or using Reduce/Combine when it becomes available) coupled
with a function to deterministically convert CouchDB document ids into
integers should make for views which can be easily indexed by Sphinx.

The map function might look like this

function(doc) {
  if (doc.title) {
    map(docIDtoInteger(doc.id), doc.title);
  }
}

It's too bad that Sphinx doesn't support arbitrary strings as document
IDs, but I'm sure there are plenty of reversible string-to-integer
mappings that could be used. In that case Sphinx would be queried and
return a list of matching integers IDs, which could be mapped back to
CouchDB document IDs, and then retrieved from the Couch.

This thought experiment is encouraging because it shows that even
without integration into CouchDB, some very useful custom full-text
indexes could be created. AFAIK Sphinx's support for updating indexes
is limited to merging new documents into the index, so it would have
little use for an API to find view-rows which have been changed or
removed. Luckily, index rebuild is lightning fast.


-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message