Return-Path: Delivered-To: apmail-incubator-clerezza-dev-archive@minotaur.apache.org Received: (qmail 93571 invoked from network); 17 Mar 2011 12:43:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Mar 2011 12:43:59 -0000 Received: (qmail 64012 invoked by uid 500); 17 Mar 2011 12:43:59 -0000 Delivered-To: apmail-incubator-clerezza-dev-archive@incubator.apache.org Received: (qmail 63982 invoked by uid 500); 17 Mar 2011 12:43:59 -0000 Mailing-List: contact clerezza-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: clerezza-dev@incubator.apache.org Delivered-To: mailing list clerezza-dev@incubator.apache.org Received: (qmail 63974 invoked by uid 99); 17 Mar 2011 12:43:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2011 12:43:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2011 12:43:57 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 079A83AD179 for ; Thu, 17 Mar 2011 12:43:35 +0000 (UTC) Date: Thu, 17 Mar 2011 12:43:35 +0000 (UTC) From: "Daniel Spicar (JIRA)" To: clerezza-dev@incubator.apache.org Message-ID: <171691477.8714.1300365815027.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <14775861.140111294169148392.JavaMail.jira@thor> Subject: [jira] Commented: (CLEREZZA-388) Composite Resource Index Service MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CLEREZZA-388?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13= 007902#comment-13007902 ]=20 Daniel Spicar commented on CLEREZZA-388: ---------------------------------------- I'd like give some feedback from some use-case scenario experience. The use case is that I have a web site with a search interface that allows = me to search for users on the platform. I'd like to be able to search "intu= itively". This means when I enter "jessica" i expect all users where jessic= a appears in the name string as a single word. A rough specification is: - exact string matching with double quotes ("phrase"). - wildcard matching (*,?) - case-insensitive search ('jessica' and 'Jessica' should deliver the same = results) - boolean condtitions for search terms (AND, OR, NOT) Lucene provides a QueryParser that supports most of these things and even m= ore (fuzzy searches, range searches, etc). --> http://lucene.apache.org/jav= a/3_0_0/queryparsersyntax.html Thus I implemented my own Condition that uses the QueryParser on the user i= nput to generate a query. But I faced some problems which need to be resolved in CRIS: 1. CRIS indexes named resources with the Field.Index.NOT_ANALYZED attribute= . This means the index is not tokenized and it is case-sensitive. 2. CRIS is currently hard-coded to deliver the top 10 results. For this use= case this would need to be configurable though. Concerning problem 1: I resolved it locally by adding another field to the indexed document: doc.add(new Field(vProperty.stringKey, propertyValue, Field.Store.YES, Fiel= d.Index.ANALYZED)) Because CRIS uses the StandardAnalyzer this means that in that new field th= e words are tokenized, common English stop words (like "a") are omitted, an= d the index is (according to my understanding) lower-case. This means that now there is a field with the exact value, and another fiel= d with a lower-case, tokenized index. The consequences from this solution are that it would be good it the GraphI= ndexer could somehow expose the Lucene Version attribute and the Analyzer t= hat it uses on the public interface so custom conditions (like mine) can us= e the same Analyzer as the index has been written with. I'll attach the GenericCondition, GraphIndexer, ResourceFinder files for re= ference. It is not production level code though. > Composite Resource Index Service > -------------------------------- > > Key: CLEREZZA-388 > URL: https://issues.apache.org/jira/browse/CLEREZZA-388 > Project: Clerezza > Issue Type: New Feature > Reporter: Reto Bachmann-Gm=C3=BCr > Assignee: Reto Bachmann-Gm=C3=BCr > > A service shall monitor a graph for resource of a specific typed and prov= ide composite indexes on specified properties. It shall support searching b= y exact value, by range as well as full-text search. This service shall mak= e it possible to provide fast faceted searches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira