Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates
 209.85.160.56 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type
         :content-transfer-encoding;
        b=kfb0gT3rMfK4KUY1xFS6j2EMfLIC/0NJvz6dKTvWNH0HTmDFxe755DStVEJclHCBoo
         2z2U5NreHnP+q6JkhEBMfFAjLjOEhjJTzV09tG3xZom12oVftIU5EUAFujSjltTbRvUA
         tKUZ0mujag2EzOF9tqD+wlp9a2kERTz7GIKFI=
MIME-Version: 1.0
Sender: jchris@gmail.com
In-Reply-To: <f334ade01001061010ka48f05fh9f0cda30122b4a4c@mail.gmail.com>
References: <f334ade01001061010ka48f05fh9f0cda30122b4a4c@mail.gmail.com>
Date: Wed, 6 Jan 2010 10:48:32 -0800
Message-ID: <e282921e1001061048j60a23781h740c9078d423a050@mail.gmail.com>
Subject: Re: Building IFI View for Text Queries
From: Chris Anderson <jchris@apache.org>
To: user@couchdb.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wed, Jan 6, 2010 at 10:10 AM, Nic Pottier <nicpottier@gmail.com> wrote:
> Howdy All,
>
> New user playing with CouchDB to evaluate whether it will work for our
> needs. =A0I have a good bit of experience with standard SQL and recently
> with Amazon's SimpleDB, but I'll admit my brain is stretching a bit to
> get the 'couch db' way of doing things.
>
> Anyways, in my particular case, I have a set of records, let's say
> they are websites, which have an id of their URL, and various
> attributes, including the 'title' of the URL.
>
> I want the ability to be able to find all sites which contain a
> particular word in their title. =A0I know that isn't directly supported
> in couch-db, and that there is a Lucene 'add on', but I'd rather avoid
> that if possible.
>
> What I have tried is to create a view that is built by doing basic
> tokenization of the titles, emitting each individual word in lowercase
> with a null value. =A0Once created this acts as an inverted file index,
> allowing me to find all the documents that contain a particular word
> etc.. =A0And it seems to work ok, it is fast, and updating documents
> seems reasonably fast as well. =A0I can also do 'OR' queries using the
> keys POST call on the view, which satisfies my requirements perfectly.
>
> What's the catch? =A0Is this ok to do? =A0Any gotchas I should be aware o=
f?
>

The only catch is that you'll end up with a large index file in the
long run. Lucene's indexes should be more compact on disk. Lucene also
has more stemming options and will generally be smarter than your
tokenizer.

That said, if it works, it works.

> Thanks,
>
> -Nic
>


--=20
Chris Anderson
http://jchrisa.net
http://couch.io