From users-return-15982-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Mon Aug 30 07:30:54 2010 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 49056 invoked from network); 30 Aug 2010 07:30:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Aug 2010 07:30:54 -0000 Received: (qmail 13346 invoked by uid 500); 30 Aug 2010 07:30:53 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 13086 invoked by uid 500); 30 Aug 2010 07:30:51 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 13076 invoked by uid 99); 30 Aug 2010 07:30:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 07:30:50 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of a.schrijvers@1hippo.com designates 64.18.2.179 as permitted sender) Received: from [64.18.2.179] (HELO exprod7og113.obsmtp.com) (64.18.2.179) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 30 Aug 2010 07:30:44 +0000 Received: from source ([209.85.215.178]) by exprod7ob113.postini.com ([64.18.6.12]) with SMTP ID DSNKTHteDzkRwRMTyXWaArBEZTC0yx2rOHHX@postini.com; Mon, 30 Aug 2010 00:30:24 PDT Received: by eyh6 with SMTP id 6so3667018eyh.37 for ; Mon, 30 Aug 2010 00:30:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.22.66 with SMTP id m2mr3552889ebb.56.1283153422877; Mon, 30 Aug 2010 00:30:22 -0700 (PDT) Received: by 10.213.104.146 with HTTP; Mon, 30 Aug 2010 00:30:22 -0700 (PDT) In-Reply-To: <4C780C98.9050208@randdss.com> References: <3B889EE9152F28429040ABDC2C606F2E04A201E6EB@MAIL01.CSUMain.csu.edu.au> <4C7671D9.3060205@randdss.com> <4C7694CD.2040501@randdss.com> <4C76CD12.2000109@randdss.com> <4C780C98.9050208@randdss.com> Date: Mon, 30 Aug 2010 09:30:22 +0200 Message-ID: Subject: Re: Problems with hyphen in JSR-170 XPath query using jcr:contains From: Ard Schrijvers To: users@jackrabbit.apache.org, wilsonh@randdss.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello, On Fri, Aug 27, 2010 at 9:06 PM, H. Wilson wrote: > =A0OK, well I got the spaces part figured out, and will post it for anyon= e who > needs it. Putting quotes around the spaces unfortunately did not work. > =A0During testing, I determined that if you performed the following query= for > the exact fullName property: > > =A0 =A0filter.addContains ( @fullName, > '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Land")); > > It would return nothing. But tweak it a little and add a wildcard, and it > would return results: > > =A0 filter.addContains ( @fullName, > =A0 '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Lan*"))= ; This does not make sense...see below > > But since I did not want to throw in wild cards where they might not be > wanted, if a search string contained spaces, did not contain wild cards a= nd > the user was not concerned with case sensitivity, I used the fn:lower-cas= e. > So I ended up with the following excerpt (our clients wanted options for > case sensitive and case insensitive searching) . > > public OurParameter[] getOurParameters (boolean performCaseSensitiveSearc= h, > String searchTerm, String srchField ) { //srchField in this case was > fullName > > =A0 ..... > > =A0 if ( performCaseSensitiveSearch) { > > =A0 =A0 =A0 //jcr:like for case sensitive > =A0 =A0 =A0 filter.orJCRExpression ("jcr:like(@" + srchField +", > '"+Text.escapeIllegalXpathSearchChars (searchTerm)+"')"); > > =A0 } > =A0 else { > > =A0 =A0 =A0 //only use fn:lower-case if there is spaces, with NO wild car= ds > > =A0 =A0 =A0 if ( searchTerm.contains (" ")&& =A0!searchTerm.contains ("*"= )&& > =A0!searchTerm.contains ("?") ) { > > =A0 =A0 =A0 =A0 =A0 filter.addJCRExpression ("fn:lower-case(@"+srchField+= ") =3D > '"+Text.escapeIllegalXpathSearchChars(searchTerm.toLowerCase())+"'"); > > =A0 =A0 =A0 } > > =A0 =A0 =A0 else { > > =A0 =A0 =A0 =A0 =A0 //jcr:contains for case insensitive > =A0 =A0 =A0 =A0 =A0 filter.addContains ( srchField, > Text.escapeIllegalXpathSearchChars(searchTerm)); > > =A0 =A0 =A0 } > > =A0 } This seems to me a workaround around the real problem, because, it just doesn't make sense to me. Can you inspect the tokens that are created by your analyser. Make sure you inspect the tokens during indexing (just store something) and during searching: just search in the property. I am quite sure you'll see the issue then. Perhaps something with Text.escapeIllegalXpathSearchChars though it seems that it should leave spaces untouched Regards Ard > =A0 .... > > } > > > Hope that helps anyone who needs it. > > H. Wilson > >>> OK so it looks like I have one other issue. Using the configuration as >>> posted below and sticking to my previous examples, with the addition of >>> one >>> with whitespace. With the following three in our repository: >>> >>> =A0 .North.South.East.WestLand >>> =A0 .North.South.East.West_Land >>> =A0 .North.South.East.West Land =A0 =A0//yes that's a space >>> >>> ...using a jcr:contains, with exact name search with NO wild cards: the >>> first two return properly, but the last one yields no result. >>> >>> =A0 filter.addContains(@fullName, >>> >>> '"+org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(".Nort= h.South.East.West >>> Land") +"')); >> >> I think the space in a contains is seen as an AND by the >> Jackrabbit/Lucene QueryParser. I should test this however as I am not >> sure. Perhaps you can put quotes around it, not sure if that works >> though >> >> Regards Ard >> >>> According to the Lucene documentation, KeywordAnalyzer should be creati= ng >>> one token, plus combined with escaping the Illegal Characters (i.e. >>> spaces), >>> shouldn't this search work? Thanks again. >>> >>> H. Wilson >