incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Bartak <john.bar...@autodesk.com>
Subject RE: Newbie question: substring matching
Date Thu, 22 Jan 2009 17:26:11 GMT
I implemented something similar recently except I wanted to find any word that started with
a substring.  What I ended up doing was emitting the first 3 characters of each word in the
field as the key.  I still had to do quite a bit of processing on the client side once I got
back the list of potential matches from CouchDB.


The following map function outputs the first three characters of each word in the Name field
of all Person objects:

function(doc) {
   function emitParts(parts,doc)
   {
     for (var i = 0; i < parts.length; ++i)
     {
         emit(parts[i].substr(0,3).toLowerCase(),doc);
     }
   }
   if (doc.type == "Person")
   {
      var parts;
      if (doc.Name && typeof(doc.Name) == "string")
          {
          parts = doc.Name.split(new RegExp("[ ]"));
          emitParts(parts,doc);
      }
    }
}


-----Original Message-----
From: Paul Davis [mailto:paul.joseph.davis@gmail.com]
Sent: Thursday, January 22, 2009 9:09 AM
To: user@couchdb.apache.org
Subject: Re: Newbie question: substring matching

You'll never be able to have a wildcard on the front side of your
pattern with couchdb directly, and you'll only be able to have a wild
card on one end of the statement.

Something you could try:

emit(doc.field_to_search, value);
emit(string_reverse_function(doc.field_to_search), vaule);

Then you could do something like:

http://127.0.0.1:5984/db_name/_view/ddoc/using_like?startkey="foo"&endkey="foo\u9999"
http://127.0.0.1:5984/db_name/_view/ddoc/using_like?startkey="oof"&endkey="oof\u9999"

And then intersect the two sets client side. Other than that, I'd look
at integrating full text search.

HTH,
Paul Davis

On Thu, Jan 22, 2009 at 11:56 AM, Brian Candler <B.Candler@pobox.com> wrote:
> Suppose I have a view which indexes a single field. Using startkey and
> endkey, it's easy to find matches which start with a particular pattern.
>
> But I'm wondering how best to do substring matches (in SQL: LIKE '%foo%')
>
> I could:
>
> 1. Read the entire view, and filter it client-side (problem: large
>   data transfer)
>
> 2. Create another view which enumerates all possible suffixes (problem:
>   large index, O(N^2))
>
> somedata
> omedata
> medata
> edata
> data
> ata
> ta
> a
>
> 3. Create a temporary view for the exact search being done (problem: forces
>   a read through all documents in the database)
>
> Is there some other option I have overlooked, such as filtering the view
> server-side somehow?
>
> Thanks,
>
> Brian.
>

Mime
View raw message