couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Gable <ziggythehams...@gmail.com>
Subject Re: Searching a view using a partial match on multiple key parts
Date Sun, 24 Oct 2010 18:04:53 GMT
On Oct 24, 2010, at 12:09 PM, John Logsdon <johnlogsdon@me.com> wrote:

> Hi Keith
>
> Very helpful. I hadn't seen/realised that I could do multi part  
> indexes with startkey endkey.
>
> Just to confirm:
>
> 1. startkey =myp, endkey=myp\u9999 is the Couchdb equivalent of an  
> SQL "LIKE  'myp%'"

Sort of. Just don't think of it as an SQL equavilent. You want the  
entire range of characters that can occur after "myp", which is  
nothing ("myp") to the last possible Unicode character ("myp\u9999").  
In this case, this technique behaves very similar to %, but it also  
allows you to chunk your query alphabetically. myp0 to myp9 is 0-9,  
mypa to mypa\u9999 is all of the A's, etc.

The \u9999 trick applies to strings only.

> 2. startkey = 0, endkey = {} is the CouchDB equiv of an SQL "LIKE   
> '%''"

In an array, 0 is the first possible item alphabetically, and a hash/ 
object is the last possible item. If you want everything with a key  
that starts with the first item being "abc", you need to say ["abc",  
0] to ["abc", {}].

Think about 0 and {} specifying a range in an array.

>
> My only other issue is that my indexes even for simple indexes e.g.  
> emit(doc.name, 1) have produced index sizes far larger than the  
> database file. Do you have any idea why that might be?

It's just how it goes. For every index, CouchDB makes a new file,  
which is in a different format than the database to allow lookup.

>
> Regards
>
> John
>
> On 24 Oct 2010, at 17:46, Keith Gable wrote:
>
>> You'd use multiple indexes:
>>
>> On Sun, 2010-10-24 at 11:53 +0100, John Logsdon wrote:
>>> Hi
>>>
>>> I have an index that has three 'groups' to to represent an Account  
>>> Name, a Contained entity name and a contained entity type e.g.  
>>> {"account":"johnl", "name":"myplan", "type":"plan"}
>>>
>>> I'm after the equivalent of a startkey endkey but for a composite  
>>> index so I could do the following types of queries:
>>>
>>> 1) Search across all Accounts for any Entity type starting  
>>> 'myp" (This supports ajax search as the user starts typing in the  
>>> search box)
>>>
>>> e.g. Account = *, Type = *, Name starts with myp
>>
>> by_name:
>>
>> emit(name, 1);
>>
>> startkey=myp
>> endkey=myp\u9999
>>
>>>
>>> 2) Search a list of Accounts for any Entity type starting 'myp"
>>>
>>> e.g. Account in johnl, mycompany, myreseller, global, Type = *,  
>>> Name starts with myp
>>
>> by_account_and_name:
>>
>> emit([account, name], 1);
>>
>> query once for every company and then merge the results in your
>> application:
>>
>> startkey=["johnl", "myp"]
>> endkey=["johnl", "myp\u9999"]
>> (etc.)
>>
>>>
>>>
>>> 3) Search for all plans named "myplan" in all accounts
>>>
>>> e.g. Account = *, Type = Plan, Name = myplan
>>
>> by_type_and_name:
>>
>> emit([type, name], 1);
>>
>> startkey=["Plan", "myplan"]
>> endkey=["Plan", "myplan"]
>>
>> (or you can probably use key=["Plan", "myplan"])
>>
>>>
>>>
>>> 4) Search a list of Accounts for all plans
>>>
>>> e.g. Account = *, Type = Plan, Name = *
>>>
>>
>> Use by_type_and_name:
>>
>> startkey=["Plan", 0]
>> endkey=["Plan", {}]
>>
>>>
>>> 5) Search a List of Accounts for all contained entities
>>>
>>> e.g. Account in johnl, mycompany, myreseller, global, Type = *,  
>>> Name = *
>>
>> Use by_account_and_name, or perhaps make a new view called by_account
>> and then just query for accounts:
>>
>> startkey=["johnl", 0]
>> endkey=["johnl", {}]
>>
>> and merge it with the results from the other accounts.
>>
>>
>>
>>
>> If you need something significantly more complex, like type = x,  
>> name =
>> x, y, or z, account is in unpaid status, and not a corporate  
>> account, or
>> something else that can't really be mapped to a key-value  
>> methodology,
>> then you'll probably need to check out CouchDB-Lucene.
>>
>>
>> P.S. I emit 1 as the value so that I have an easy way to count the
>> results. If you have something better (dollars? hits?), then you  
>> should
>> emit that instead. I don't see the point in emitting the ID because  
>> you
>> can use include_docs=true to get the documents, and IIRC, the ID is
>> passed anyways.
>>
>

Mime
View raw message