incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Building IFI View for Text Queries
Date Wed, 06 Jan 2010 18:48:32 GMT
On Wed, Jan 6, 2010 at 10:10 AM, Nic Pottier <nicpottier@gmail.com> wrote:
> Howdy All,
>
> New user playing with CouchDB to evaluate whether it will work for our
> needs.  I have a good bit of experience with standard SQL and recently
> with Amazon's SimpleDB, but I'll admit my brain is stretching a bit to
> get the 'couch db' way of doing things.
>
> Anyways, in my particular case, I have a set of records, let's say
> they are websites, which have an id of their URL, and various
> attributes, including the 'title' of the URL.
>
> I want the ability to be able to find all sites which contain a
> particular word in their title.  I know that isn't directly supported
> in couch-db, and that there is a Lucene 'add on', but I'd rather avoid
> that if possible.
>
> What I have tried is to create a view that is built by doing basic
> tokenization of the titles, emitting each individual word in lowercase
> with a null value.  Once created this acts as an inverted file index,
> allowing me to find all the documents that contain a particular word
> etc..  And it seems to work ok, it is fast, and updating documents
> seems reasonably fast as well.  I can also do 'OR' queries using the
> keys POST call on the view, which satisfies my requirements perfectly.
>
> What's the catch?  Is this ok to do?  Any gotchas I should be aware of?
>

The only catch is that you'll end up with a large index file in the
long run. Lucene's indexes should be more compact on disk. Lucene also
has more stemming options and will generally be smarter than your
tokenizer.

That said, if it works, it works.

> Thanks,
>
> -Nic
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message