From couchdb-user-return-941-apmail-incubator-couchdb-user-archive=incubator.apache.org@incubator.apache.org Sat Jul 26 00:01:47 2008 Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 65351 invoked from network); 26 Jul 2008 00:01:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Jul 2008 00:01:46 -0000 Received: (qmail 53981 invoked by uid 500); 26 Jul 2008 00:01:45 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 53944 invoked by uid 500); 26 Jul 2008 00:01:45 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 53933 invoked by uid 99); 26 Jul 2008 00:01:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2008 17:01:45 -0700 X-ASF-Spam-Status: No, hits=3.2 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.146.179] (HELO wa-out-1112.google.com) (209.85.146.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Jul 2008 00:00:49 +0000 Received: by wa-out-1112.google.com with SMTP id m16so2547688waf.6 for ; Fri, 25 Jul 2008 17:01:14 -0700 (PDT) Received: by 10.114.148.2 with SMTP id v2mr2717242wad.173.1217030474036; Fri, 25 Jul 2008 17:01:14 -0700 (PDT) Received: by 10.114.254.4 with HTTP; Fri, 25 Jul 2008 17:01:13 -0700 (PDT) Message-ID: <64a10fff0807251701m1045939fq43a0a5e1f215f20@mail.gmail.com> Date: Fri, 25 Jul 2008 20:01:13 -0400 From: "Dean Landolt" To: couchdb-user@incubator.apache.org Subject: Re: the search api? In-Reply-To: <64a10fff0807210845u2387ccb9udd466225465022c7@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_97_5899300.1217030474024" References: <64a10fff0807210845u2387ccb9udd466225465022c7@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_97_5899300.1217030474024 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Mon, Jul 21, 2008 at 11:45 AM, Dean Landolt wrote: > On Mon, Jul 21, 2008 at 1:08 AM, Dan Reverri wrote: > >> Is it worthwhile to implement a full text indexer on top of couchdbs >> map/reduce functionality? >> >> http://wiki.apache.org/couchdb/FullTextIndexWithView >> > > > Interesting idea. There's definitely more to FTI than tokenization alone, > but then again there's an awful lot of power in m/r and javascript -- it > didn't take me a second to find a porter stemming algorithm in js: > http://tartarus.org/~martin/PorterStemmer/js.txt > > I bet variable weighting would be pretty close to impossible in the m/r > paradigm though, and probably some other features (of course, I could be > wrong, and when it comes to couchdb, thus far I usually am). For a strait-up > word search, this is servicible as is. I'm going to see if I can't figure > out how to shoehorn in some boolean features. > I gave this approach another look and I was able to get a view together that did a little more (stemming, optional case-insensitivity, min length for tokens, better whitespace handling). I'm working on an ngram view too and so far it's promising. But there's still one huge problem -- for the life of me I can't figure out a workable strategy for boolean operations that doesn't involve fully loading each piece of the query. Am I missing something? Is something like this even possible? I know there's no way to load a piece of a view from another view -- but I just can't help but really wish there were. ------=_Part_97_5899300.1217030474024--