Return-Path: Delivered-To: apmail-incubator-couchdb-dev-archive@locus.apache.org Received: (qmail 63759 invoked from network); 12 May 2008 16:15:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 May 2008 16:15:19 -0000 Received: (qmail 82506 invoked by uid 500); 12 May 2008 16:15:19 -0000 Delivered-To: apmail-incubator-couchdb-dev-archive@incubator.apache.org Received: (qmail 82479 invoked by uid 500); 12 May 2008 16:15:19 -0000 Mailing-List: contact couchdb-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-dev@incubator.apache.org Delivered-To: mailing list couchdb-dev@incubator.apache.org Received: (qmail 82463 invoked by uid 99); 12 May 2008 16:15:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2008 09:15:19 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2008 16:14:40 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8F7CA234C111 for ; Mon, 12 May 2008 09:14:55 -0700 (PDT) Message-ID: <469509464.1210608895583.JavaMail.jira@brutus> Date: Mon, 12 May 2008 09:14:55 -0700 (PDT) From: "Jun Rao (JIRA)" To: couchdb-dev@incubator.apache.org Subject: [jira] Created: (COUCHDB-53) Incorporating JSearch to CouchDB MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Incorporating JSearch to CouchDB -------------------------------- Key: COUCHDB-53 URL: https://issues.apache.org/jira/browse/COUCHDB-53 Project: CouchDB Issue Type: New Feature Components: Full-Text Search Environment: JSearch is developed in Java Reporter: Jun Rao JSearch is a prototype that we developed for indexing and searching Json documents, and we are enthusiastic about contributing it to CouchDB. JSearch converts a given Json document to a Lucene document for indexing. The conversion is lossless and preserves all structural information in the original Json document. We achieve that by storing the encoding of Json structures in the payload of the posting list in a Lucene index. JSearch has a simple query language that combines fulltext search and structural querying. To qualify as a match, a document has to match both the JSON structures as well as the Boolean constraints specified in the query. Suppose that we have indexed the following two JSON documents: d1={ A: [ { B: "b1", C: "c1" }, { B: "b2", C: "c2" }, ] } d2={ A: [ { B: "b1", C: "c2" }, { B: "b2", C: "c1" }, ] } One can issue the following two JSeach queries. P={ A: [ { B: "b1" && C: "c1" } ] } Q={ A: [ { B: "b1"} && {C: "c1" } ] } Query P ("&&" specifies conjunction) matches d1, but not d2. The reason is that d2 doesn't have the proper B and C fields within the same JSON object. On the other hand, query Q matches both d1 and d2, since it doesn't require the B field and the C field to be in the same JSON object. Here is a summary of the querying features in JSearch 1. arbitrary conjunctive and disjunctive constraints 2. text search on atomic values of string type 3. range constraints on atomic values (only those of string and long types are currently supported) 4. document level matching The easiest way to know more about JSeach is to give it a try. Download the attached tgz file. Follow the readme file in it and try some of the examples. The attachment also includes all Java source code (I can provide more technical details if needed). I am very interested in your feedback. Does JSearch fit into CouchDB? What other features are needed? How should JSearch be integrated (from Jan's mail, it seems that some infrastructure is already in-place)? Thanks, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.