Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 56276 invoked from network); 6 Jan 2010 18:49:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jan 2010 18:49:02 -0000 Received: (qmail 20010 invoked by uid 500); 6 Jan 2010 18:49:00 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 19955 invoked by uid 500); 6 Jan 2010 18:49:00 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 19945 invoked by uid 99); 6 Jan 2010 18:49:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2010 18:49:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates 209.85.160.56 as permitted sender) Received: from [209.85.160.56] (HELO mail-pw0-f56.google.com) (209.85.160.56) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2010 18:48:52 +0000 Received: by pwi19 with SMTP id 19so12331377pwi.35 for ; Wed, 06 Jan 2010 10:48:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=2RWvToIs1DUYEYSUTdFQNnyVU54fDLy54QdKsmdUTU8=; b=TzvcpuJlT2JGlUIh+fDiiaiky7G40HvNdhhlOi1ojf7AMWjt91XJGGr6SRNlXrkPNT YhDeQpQLchAiIV3c3P97ZkJNZMPbIcT6Yohz3qbdX8RGJNyArc0hPTZXlEqUY9P36X6w ndG0s7/pdnRanu543AcSU1X2mrvNbx6/guFvw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=kfb0gT3rMfK4KUY1xFS6j2EMfLIC/0NJvz6dKTvWNH0HTmDFxe755DStVEJclHCBoo 2z2U5NreHnP+q6JkhEBMfFAjLjOEhjJTzV09tG3xZom12oVftIU5EUAFujSjltTbRvUA tKUZ0mujag2EzOF9tqD+wlp9a2kERTz7GIKFI= MIME-Version: 1.0 Sender: jchris@gmail.com Received: by 10.142.9.16 with SMTP id 16mr1558770wfi.92.1262803712236; Wed, 06 Jan 2010 10:48:32 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Jan 2010 10:48:32 -0800 X-Google-Sender-Auth: 036b675e59f8586d Message-ID: Subject: Re: Building IFI View for Text Queries From: Chris Anderson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Wed, Jan 6, 2010 at 10:10 AM, Nic Pottier wrote: > Howdy All, > > New user playing with CouchDB to evaluate whether it will work for our > needs. =A0I have a good bit of experience with standard SQL and recently > with Amazon's SimpleDB, but I'll admit my brain is stretching a bit to > get the 'couch db' way of doing things. > > Anyways, in my particular case, I have a set of records, let's say > they are websites, which have an id of their URL, and various > attributes, including the 'title' of the URL. > > I want the ability to be able to find all sites which contain a > particular word in their title. =A0I know that isn't directly supported > in couch-db, and that there is a Lucene 'add on', but I'd rather avoid > that if possible. > > What I have tried is to create a view that is built by doing basic > tokenization of the titles, emitting each individual word in lowercase > with a null value. =A0Once created this acts as an inverted file index, > allowing me to find all the documents that contain a particular word > etc.. =A0And it seems to work ok, it is fast, and updating documents > seems reasonably fast as well. =A0I can also do 'OR' queries using the > keys POST call on the view, which satisfies my requirements perfectly. > > What's the catch? =A0Is this ok to do? =A0Any gotchas I should be aware o= f? > The only catch is that you'll end up with a large index file in the long run. Lucene's indexes should be more compact on disk. Lucene also has more stemming options and will generally be smarter than your tokenizer. That said, if it works, it works. > Thanks, > > -Nic > --=20 Chris Anderson http://jchrisa.net http://couch.io