incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Adermann <nader...@naderman.de>
Subject Re: Sphinx license
Date Sat, 29 Mar 2008 02:23:02 GMT
Chris Anderson wrote:
> Sphinx is not the best contender for integration, because of it's
> limited support for incremental updates. It is, however, a good
> boundary condition on how to design the Indexer API so that a wide
> range of search engines can work with CouchDB.
>   
Sphinx is going to support real time updates in one of the next few 
releases so that won't be a problem much longer.

However there's a different problem with using Sphinx to search CouchDB: 
Sphinx is not designed to index documents with differing structures. All 
documents in an index have to follow the same structure. You can still 
use Sphinx with CouchDB very well if you only index views. You have to 
know the exact structure of all view results and then you can tell 
Sphinx about the strucure and it will be able to index the result.

But if you want to search any arbitrary CouchDB database then it gets a 
lot more complicated. Sphinx only supports a fixed number of fulltext 
searchable text fields per document (32). That number is definately high 
enough for most documents but it does not  reflect CouchDB's 
flexibility. In order to use Sphinx on a dynamic schema you would have 
to go through all documents to create a mapping of the hierarchically 
stored values into a one dimensional associative array (2 dimensional 
for the multivalue attributes) and then store this mapping with each 
document. Now you can go through the documents and extend the static 
schema on every document that requires an additional field. You can 
either reuse fields which makes the entire grouping and sorting useless 
because each field has a different meaning for each document or you 
leave a lot of fields empty creating a huge overhead.

An alternative would be to create a lot of indexes with different 
schemas as Sphinx supports searching multiple indexes at a time. But I 
doubt this idea scales well if you have a different schema on every 
document.

So my approach to integration was rather to allow Sphinx to use CouchDB 
as a data source. You can configure Sphinx to index a certain view then 
and the view will have to produce 1-dimensional JSON results that work 
for Sphinx. Searching does not use CouchDB's REST API at all then. This  
method works fine for applications where many documents have the same 
structure (like the demo forum or an article/comments site like a blog) 
or for applications where the number of structures that documents can 
have is limited (you can create a mapping to one larger common structure 
then). However this will not be useful to any application that really 
makes use of CouchDB's flexible structure so I certainly hope there'll 
be other systems available for searching.

Cheers!
Nils

Mime
View raw message