From user-return-3074-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Thu Jan 22 17:26:47 2009 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 75004 invoked from network); 22 Jan 2009 17:26:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jan 2009 17:26:47 -0000 Received: (qmail 16815 invoked by uid 500); 22 Jan 2009 17:26:45 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 16781 invoked by uid 500); 22 Jan 2009 17:26:45 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 16770 invoked by uid 99); 22 Jan 2009 17:26:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 09:26:45 -0800 X-ASF-Spam-Status: No, hits=-1.6 required=10.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_MED,SPF_PASS,URIBL_RHS_DOB,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of john.bartak@autodesk.com designates 198.102.112.47 as permitted sender) Received: from [198.102.112.47] (HELO cut.autodesk.com) (198.102.112.47) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 17:26:37 +0000 Received: from smtp.mgd.autodesk.com ([65.54.1.156]) by cut.autodesk.com (8.14.1/8.12.6) with ESMTP id n0MHQA8f028406 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Thu, 22 Jan 2009 09:26:17 -0800 (PST) Received: from ADSK-NAMSG-02.MGDADSK.autodesk.com ([65.54.1.155]) by ADSK-TK5MHUB-01.MGDADSK.autodesk.com ([65.54.1.156]) with mapi; Thu, 22 Jan 2009 09:26:13 -0800 From: John Bartak To: "user@couchdb.apache.org" Date: Thu, 22 Jan 2009 09:26:11 -0800 Subject: RE: Newbie question: substring matching Thread-Topic: Newbie question: substring matching Thread-Index: Acl8tCTq5zFoqQcXRaSyL38Jqn+fSwAAPG3Q Message-ID: References: <20090122165608.GA28398@uk.tiscali.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I implemented something similar recently except I wanted to find any word t= hat started with a substring. What I ended up doing was emitting the first= 3 characters of each word in the field as the key. I still had to do quit= e a bit of processing on the client side once I got back the list of potent= ial matches from CouchDB. The following map function outputs the first three characters of each word = in the Name field of all Person objects: function(doc) { function emitParts(parts,doc) { for (var i =3D 0; i < parts.length; ++i) { emit(parts[i].substr(0,3).toLowerCase(),doc); } } if (doc.type =3D=3D "Person") { var parts; if (doc.Name && typeof(doc.Name) =3D=3D "string") { parts =3D doc.Name.split(new RegExp("[ ]")); emitParts(parts,doc); } } } -----Original Message----- From: Paul Davis [mailto:paul.joseph.davis@gmail.com] Sent: Thursday, January 22, 2009 9:09 AM To: user@couchdb.apache.org Subject: Re: Newbie question: substring matching You'll never be able to have a wildcard on the front side of your pattern with couchdb directly, and you'll only be able to have a wild card on one end of the statement. Something you could try: emit(doc.field_to_search, value); emit(string_reverse_function(doc.field_to_search), vaule); Then you could do something like: http://127.0.0.1:5984/db_name/_view/ddoc/using_like?startkey=3D"foo"&endkey= =3D"foo\u9999" http://127.0.0.1:5984/db_name/_view/ddoc/using_like?startkey=3D"oof"&endkey= =3D"oof\u9999" And then intersect the two sets client side. Other than that, I'd look at integrating full text search. HTH, Paul Davis On Thu, Jan 22, 2009 at 11:56 AM, Brian Candler wrote= : > Suppose I have a view which indexes a single field. Using startkey and > endkey, it's easy to find matches which start with a particular pattern. > > But I'm wondering how best to do substring matches (in SQL: LIKE '%foo%') > > I could: > > 1. Read the entire view, and filter it client-side (problem: large > data transfer) > > 2. Create another view which enumerates all possible suffixes (problem: > large index, O(N^2)) > > somedata > omedata > medata > edata > data > ata > ta > a > > 3. Create a temporary view for the exact search being done (problem: forc= es > a read through all documents in the database) > > Is there some other option I have overlooked, such as filtering the view > server-side somehow? > > Thanks, > > Brian. >