couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Kumar <suraj.ku...@inmobi.com>
Subject Sub string search (and complex queries) approach review needed
Date Tue, 15 Oct 2013 13:53:53 GMT
Hi,

I'd like to enable my users to do sub string search of arbitrary attributes
of documents on-the-fly. Luckily most of the attributes of the documents
are like 'enum' or a finite / small range of values.

How do we achieve the above best? Is it possible to avoid writing any
middleware altogether? How easy would it be to achieve this in erlang,
assuming I'm a completely erlang novice?

I have a 'middleware' approach which I have outlined below. Your inputs
will be highly appreciated on whether you think there is a better approach
than this.

To achieve sub string search on arbitrary attributes on-the-fly, I intend
to write a middle ware API which in combination with a set of view
functions will make concurrent specific-key calls to merge the results and
send them back:

1. Build one view each for those attributes by which I'd like to enable
people to do sub string search: This view will return the list of unique
values for that attribute through a map-reduce.
2. Write a middle ware Search API which will do the following:

   a. given attribute A and substring S as inputs...
   b. call above mentioned view to get unique list of values for attribute
A (ie., call ".../_view/get_unique_values_of_" + A).
   c. Foreach item in above values, find sub set of values where
substr(item, S) = true.
   d. Foreach full_key in subset, make concurrent View API calls with
?key=full_key
   e. Merge results from these 'concurrent streams' in sorted order (and
yes, take advantage of the fact that the results from views are already
sorted for given key) and return them in-situ to caller whenever
appropriate. Assuming the 'gap' between data sets is not large, the middle
ware will more or less buffer no more than GAP number of elements in an
internal buffer before sending the results out. I'm using Node.js for the
middle ware.

The reason I'm building this API is to also make it possible for clients to
potentially also do complex queries later (and/or/etc., compound rules)
because our users demand it. I intend to make the API pseudo-compatible
with CouchDB API ?key="..." (except the string passed as key value will be
a complex and/or rule (like "key1=value1&key2=value2"). Perhaps couchdb is
a bad choice for this kind of a SQL-like querying need... but couchdb
shines at all the other fronts of my requirements that I decided to make-do
with some such approach.

Awaiting valuable feedback from the community.

Regards,

  -Suraj

-- 
An Onion is the Onion skin and the Onion under the skin until the Onion
Skin without any Onion underneath.

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message