Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 72773 invoked from network); 28 Mar 2011 15:17:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Mar 2011 15:17:23 -0000 Received: (qmail 24704 invoked by uid 500); 28 Mar 2011 15:17:21 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 24656 invoked by uid 500); 28 Mar 2011 15:17:21 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 24648 invoked by uid 99); 28 Mar 2011 15:17:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Mar 2011 15:17:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [109.169.49.54] (HELO zoe.mltserver-three.co.uk) (109.169.49.54) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Mar 2011 15:17:16 +0000 Received: from 02d83053.bb.sky.com ([2.216.48.83] helo=[192.168.0.3]) by zoe.mltserver-three.co.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1Q4EBY-0008EN-Bz for user@couchdb.apache.org; Mon, 28 Mar 2011 16:16:52 +0100 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Full text search - is it coming? If yes, approx when. From: Martin Hewitt In-Reply-To: Date: Mon, 28 Mar 2011 16:16:50 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <3D13B027-14D1-4EA6-9751-591173ED57D2@rgabostyle.com> <4AEB3F68-AB89-4EED-A953-8FF1BC59CA0D@supercoders.com.au> <9F98111A-5E6B-479B-A770-A69467C980D4@supercoders.com.au> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1084) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - zoe.mltserver-three.co.uk X-AntiAbuse: Original Domain - couchdb.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - thenoi.se X-Source: X-Source-Args: X-Source-Dir: Bit off-topic Robert, but CouchDB-lucene really is superb, we use it = extensively and it's just brilliant, so thanks for contributing it to = the Couch community. Martin=20 On 28 Mar 2011, at 15:30, Robert Newson wrote: > I am a CouchDB committer and author of couchdb-lucene. :) >=20 > B. >=20 > On 28 March 2011 10:44, Andrew Stuart (SuperCoders) > wrote: >> Hi Robert >>=20 >> "there are no publicly known plans to build a native full-text = indexing >> feature for CouchDB." >>=20 >> I don't know who is who around here as yet - are you commenting from = inside >> knowledge or as an end user/developer? >>=20 >> Thanks >>=20 >>=20 >> On 28/03/2011, at 8:24 PM, Robert Newson wrote: >>=20 >> I have to dispute "There does not seem to be much understanding that >> this could be a killer feature." >>=20 >> Obviously full-text search is a killer feature, but it's trivially >> available now via couchdb-lucene or elasticsearch. >>=20 >> What people are asking for is native full-text search which, to me, = is >> essentially asking for an Erlang port of Lucene. We'd love this, but >> it's a huge amount of work. Continually asking others to do >> significant amounts of work is also wearying. >>=20 >> To replace a Lucene-based solution and match its quality and breadth >> is a huge chunk of work and is only necessary to satisfy people who, >> for various reasons, don't want to use Java. >>=20 >> To answer the original post, there are no publicly known plans to >> build a native full-text indexing feature for CouchDB. >>=20 >> B. >>=20 >> On 28 March 2011 10:15, Olafur Arason wrote: >>>=20 >>> There does not seem to be much understanding that this could be a = killer >>> feature. People are now relying on Lucene which monitors the = _changes >>> feed. >>>=20 >>> Cloudant has done it's own implementation which I gather through the >>> information they have published makes a view out of all your word, >>> they recommend java view because you can then reuse the lexer from >>> Lucene. Then I think they are reusing the reader of the view to make >>> their query. They have a similar syntax as Lucene for the query = interface. >>> They are still working on this and I think they don't have that much >>> incentive to opensource it right away. But they have in past both >>> opensourced there technology like BigCouch so I think it's more a >>> matter of when rather then if. >>>=20 >>> I think this is a good solution for a fulltext search. But I don't = think >>> that >>> the java view does not have direct access to the data so it could be >>> slow. But cloudant does clustering on view generation so that helps. >>>=20 >>> But there is also general problem with the current view system where >>> search technology could be used. >>>=20 >>> The view are really good at sorting but people are using them to >>> do key matches which they are not designed for. They beginkey and >>> endkey are for sorting ranges and are not good for matching which >>> most resources online are pointing to. >>>=20 >>> For example when you do: >>> beginkey =3D ["key11", "key21"] >>> endkey =3D ["key19", "key21"] >>>=20 >>> You get ["key11","key22"], ["key11", "key23"] ... ["key12","key21"], >>> ["key12","key22"]... >>> which makes sense when looking up sorting ranges but not using it to >>> match keys. But you can have a range match lookup but only on the >>> last key and never on two keys. So this would work: >>>=20 >>> beginkey =3D ["key21", "key11"] >>> endkey =3D ["key21", "key19"] >>>=20 >>> The current view interface could be augmented to accept queries >>> and could make them much more powerful then they currently are >>> and just using the keys for sorting and selecting which values you >>> want shown which they are designed to do and do really well. >>>=20 >>> This would be a killer feature and could use the new infrastructure >>> from Cloudant search. >>>=20 >>> And don't tell me the Elastic or Lucene interface could do anything >>> close to this :) >>>=20 >>> Regards, >>> Olafur Arason >>>=20 >>> On Mon, Mar 28, 2011 at 04:31, Andrew Stuart (SuperCoders) >>> wrote: >>>>=20 >>>> It would be good to know if full text search is coming as a core = feature >>>> and >>>> if yes, approximately when - does anyone know? >>>>=20 >>>> Even an approximate timeframe would be good. >>>>=20 >>>> thanks >>>>=20 >>>=20 >> -- >> Message protected by MailGuard: e-mail anti-virus, anti-spam and = content >> filtering.http://www.mailguard.com.au/mg >> Click here to report this message as spam: >> = https://login.mailguard.com.au/report/1BZveI1wri/4izG2DWUCf9OUvbAh9DkfT/0 >>=20