Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D5A810E21 for ; Tue, 15 Oct 2013 13:54:25 +0000 (UTC) Received: (qmail 24771 invoked by uid 500); 15 Oct 2013 13:54:23 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 24500 invoked by uid 500); 15 Oct 2013 13:54:23 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 24490 invoked by uid 99); 15 Oct 2013 13:54:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 13:54:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of suraj.kumar@inmobi.com designates 209.85.223.173 as permitted sender) Received: from [209.85.223.173] (HELO mail-ie0-f173.google.com) (209.85.223.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 13:54:15 +0000 Received: by mail-ie0-f173.google.com with SMTP id u16so7895432iet.32 for ; Tue, 15 Oct 2013 06:53:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=Iiyx1C52D2i6eoW7RlWf+2UUVPxMXsTVGPnTpYzZgqw=; b=fApMNoVN25Mv2+AVmoZsreAJVKhVTLNF+4a/5JWiIsvOhe3o00E9R6z5JMs/jQCzMb LquEN/JNFPCtmsAjsI4A2ZAcLlP3WWrasFOFeaXHD7qelvV/tj4hR+jusm53B55OR9Pl Hl1QbSCjvFnEscxILKb3EBe2YX3pgftytYCKqHV7+5yZh29Ox1iTETwPm9FLYwdDRElj vxh8GNT6ZL2SMYJhWc1KCplKwDypwlmfdKI1rou2wA/9X4+ZO9fYKLAOlgZT8kocapvl zHc6+5bqb/RGq9VNx9+VdpAL94zYZPcbB58+tGz6DslNF5NSrgpScRTJcv5uOWkH1ZkJ UT3Q== X-Gm-Message-State: ALoCoQnVzPa2ZFPpWylAbE7ATxhkGtwslGEsjLVWIsvrpaO8GD4tLlzBUUoBukBd1D/2p6+qwuJNZxreCkfcxxVC8nKGIMQ2EuSBLuYMrcp9uNixZJMAY1s= MIME-Version: 1.0 X-Received: by 10.50.178.234 with SMTP id db10mr17303818igc.35.1381845233902; Tue, 15 Oct 2013 06:53:53 -0700 (PDT) Received: by 10.64.224.244 with HTTP; Tue, 15 Oct 2013 06:53:53 -0700 (PDT) Date: Tue, 15 Oct 2013 19:23:53 +0530 Message-ID: Subject: Sub string search (and complex queries) approach review needed From: Suraj Kumar To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=089e015387f04d46ae04e8c7ea19 X-Virus-Checked: Checked by ClamAV on apache.org --089e015387f04d46ae04e8c7ea19 Content-Type: text/plain; charset=US-ASCII Hi, I'd like to enable my users to do sub string search of arbitrary attributes of documents on-the-fly. Luckily most of the attributes of the documents are like 'enum' or a finite / small range of values. How do we achieve the above best? Is it possible to avoid writing any middleware altogether? How easy would it be to achieve this in erlang, assuming I'm a completely erlang novice? I have a 'middleware' approach which I have outlined below. Your inputs will be highly appreciated on whether you think there is a better approach than this. To achieve sub string search on arbitrary attributes on-the-fly, I intend to write a middle ware API which in combination with a set of view functions will make concurrent specific-key calls to merge the results and send them back: 1. Build one view each for those attributes by which I'd like to enable people to do sub string search: This view will return the list of unique values for that attribute through a map-reduce. 2. Write a middle ware Search API which will do the following: a. given attribute A and substring S as inputs... b. call above mentioned view to get unique list of values for attribute A (ie., call ".../_view/get_unique_values_of_" + A). c. Foreach item in above values, find sub set of values where substr(item, S) = true. d. Foreach full_key in subset, make concurrent View API calls with ?key=full_key e. Merge results from these 'concurrent streams' in sorted order (and yes, take advantage of the fact that the results from views are already sorted for given key) and return them in-situ to caller whenever appropriate. Assuming the 'gap' between data sets is not large, the middle ware will more or less buffer no more than GAP number of elements in an internal buffer before sending the results out. I'm using Node.js for the middle ware. The reason I'm building this API is to also make it possible for clients to potentially also do complex queries later (and/or/etc., compound rules) because our users demand it. I intend to make the API pseudo-compatible with CouchDB API ?key="..." (except the string passed as key value will be a complex and/or rule (like "key1=value1&key2=value2"). Perhaps couchdb is a bad choice for this kind of a SQL-like querying need... but couchdb shines at all the other fronts of my requirements that I decided to make-do with some such approach. Awaiting valuable feedback from the community. Regards, -Suraj -- An Onion is the Onion skin and the Onion under the skin until the Onion Skin without any Onion underneath. -- _____________________________________________________________ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. --089e015387f04d46ae04e8c7ea19--