From user-return-15463-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Mon Mar 28 09:16:26 2011 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 89094 invoked from network); 28 Mar 2011 09:16:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Mar 2011 09:16:26 -0000 Received: (qmail 83154 invoked by uid 500); 28 Mar 2011 09:16:24 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 83104 invoked by uid 500); 28 Mar 2011 09:16:24 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 83096 invoked by uid 99); 28 Mar 2011 09:16:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Mar 2011 09:16:24 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of olafura@olafura.com designates 69.89.21.11 as permitted sender) Received: from [69.89.21.11] (HELO outbound-mail-01.bluehost.com) (69.89.21.11) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 28 Mar 2011 09:16:18 +0000 Received: (qmail 17437 invoked by uid 0); 28 Mar 2011 09:15:57 -0000 Received: from unknown (HELO host301.hostmonster.com) (74.220.215.101) by cpoproxy1.bluehost.com with SMTP; 28 Mar 2011 09:15:57 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=olafura.com; h=MIME-Version:In-Reply-To:References:From:Date:Message-ID:Subject:To:Content-Type:X-Identified-User; b=aIlB8oLJ6xP1fwomUtdq1RU6+6mZQRB62hBS1XRGqgoyHh9N2UgUh3GlhMoq3CqJX1vv3Yxv0R5iIl5iRGzh7dYmvDw/ATWQmxFDhN3IAOS2XL/AIBk2BPxKfOkK16EF; Received: from mail-qw0-f52.google.com ([209.85.216.52]) by host301.hostmonster.com with esmtpsa (TLSv1:RC4-MD5:128) (Exim 4.69) (envelope-from ) id 1Q48YH-0001uA-3v for user@couchdb.apache.org; Mon, 28 Mar 2011 03:15:57 -0600 Received: by qwb8 with SMTP id 8so2464251qwb.11 for ; Mon, 28 Mar 2011 02:15:56 -0700 (PDT) Received: by 10.229.101.36 with SMTP id a36mr3051527qco.74.1301303756123; Mon, 28 Mar 2011 02:15:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.236.138 with HTTP; Mon, 28 Mar 2011 02:15:36 -0700 (PDT) In-Reply-To: References: <3D13B027-14D1-4EA6-9751-591173ED57D2@rgabostyle.com> <4AEB3F68-AB89-4EED-A953-8FF1BC59CA0D@supercoders.com.au> <9F98111A-5E6B-479B-A770-A69467C980D4@supercoders.com.au> From: Olafur Arason Date: Mon, 28 Mar 2011 09:15:36 +0000 Message-ID: Subject: Re: Full text search - is it coming? If yes, approx when. To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Identified-User: {3624:host301.hostmonster.com:olafurac:olafura.com} {sentby:smtp auth 209.85.216.52 authed with olafurac} There does not seem to be much understanding that this could be a killer feature. People are now relying on Lucene which monitors the _changes feed. Cloudant has done it's own implementation which I gather through the information they have published makes a view out of all your word, they recommend java view because you can then reuse the lexer from Lucene. Then I think they are reusing the reader of the view to make their query. They have a similar syntax as Lucene for the query interface. They are still working on this and I think they don't have that much incentive to opensource it right away. But they have in past both opensourced there technology like BigCouch so I think it's more a matter of when rather then if. I think this is a good solution for a fulltext search. But I don't think that the java view does not have direct access to the data so it could be slow. But cloudant does clustering on view generation so that helps. But there is also general problem with the current view system where search technology could be used. The view are really good at sorting but people are using them to do key matches which they are not designed for. They beginkey and endkey are for sorting ranges and are not good for matching which most resources online are pointing to. For example when you do: beginkey = ["key11", "key21"] endkey = ["key19", "key21"] You get ["key11","key22"], ["key11", "key23"] ... ["key12","key21"], ["key12","key22"]... which makes sense when looking up sorting ranges but not using it to match keys. But you can have a range match lookup but only on the last key and never on two keys. So this would work: beginkey = ["key21", "key11"] endkey = ["key21", "key19"] The current view interface could be augmented to accept queries and could make them much more powerful then they currently are and just using the keys for sorting and selecting which values you want shown which they are designed to do and do really well. This would be a killer feature and could use the new infrastructure from Cloudant search. And don't tell me the Elastic or Lucene interface could do anything close to this :) Regards, Olafur Arason On Mon, Mar 28, 2011 at 04:31, Andrew Stuart (SuperCoders) wrote: > It would be good to know if full text search is coming as a core feature and > if yes, approximately when - does anyone know? > > Even an approximate timeframe would be good. > > thanks >