jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "H. Wilson" <wils...@randdss.com>
Subject Re: jcr:contains with wildcards and underscores
Date Thu, 26 Aug 2010 16:29:04 GMT
  **For anyone who stumbles into this post with the same problem, head 
on over here ( http://markmail.org/thread/t5hmrob3jdmz7nqm ) for more 
discussion and the solution that ended up working for us.

H. Wilson

On 06/04/2010 09:21 AM, H. Wilson wrote:
> Hello,
>
> I am using Jackrabbit 2.0 with OCM and after searching forums both 
> here and on Lucene, as well as Google, I have yet to find an answer. 
> (On an aside, if this question should have gone to the Lucene user's 
> list, please let me know!).
>
> For starters, you should know our clients would like both 
> case-sensitive and case-insensitive options available to them. The 
> searches are to be on a property named fullName, which may contain 
> underscores and always contains a leading dot. (Also our client's 
> requirement.) And while yes, we are aware that leading wildcard 
> searches are not the best, the client still plans to use them. Here is 
> my issue:
>
>    * My searches using jcr:like work fine for all the scenarios I list
>      below.
>    * My searches with jcr:contains and exact names work fine (even with
>      underscores!).
>    * My jcr:contains searches using wildcards and underscores always
>      fail. I have even tried escaping them.
>
> Given there are two objects in our repository with the following 
> fullName properties:
>
>    .North.South.East.WestLand
>    .North.South.East.West_Land
>
>
> Both of the following work fine, and each return the respective object:
>
>    (jcr:contains(@fullName, '.North.South.East.WestLand'))
>    (jcr:contains(@fullName, '.North.South.East.West_Land'))
>
>
> The following jcr:contains queries return BOTH objects successfully:
>
>    *North*
>    .North*
>    .North.*
>
> The following queries successfully return the FIRST object:
>
>    *.South.East.WestLand
>    .*.South.East.WestLand
>    *South*.WestLand
>    *East.WestLand
>    *.WestLand
>    *East?WestLand
>    *?WestLand
>    *North.South.East.WestLand
>
> And the following identical jcr:contains queries (except the 
> underscore) do not return anything, when I would expect the SECOND 
> Object:
>
>    *.South.East.West_Land
>    .*.South.East.West_Land
>    *South*.West_Land
>    *East.West_Land
>    *.West_Land
>    *East?West_Land
>    *?West_Land
>    *North.South.East.West_Land
>
> UPDATE: After I wrote this large message, I just remembered something. 
> (It should be noted - I have been trying to tackle this off and on for 
> weeks, please bear with the slight memory loss, but maybe having seen 
> all this will help others.) I remember reading somewhere that Lucene 
> treats underscores as token dividers. So when I have Object properties 
> with underscores, it is splitting it into Tokens and essentially 
> dropping the underscore completely. Which could explain why exact name 
> search works. (Is this correct?) The above examples were using the 
> StandardAnalyzer. I have previously tried using the 
> WhitespaceAnalyzer, but doing so disables my ability to do leading 
> wildcard searches, which is absolutely required by our clients. I know 
> there is a way to turn on the leading wild card searches, but I could 
> not gather how to do it while using JackRabbit. Any advice on a way to 
> use any Analyzer which would satisfy our clients would be GREATLY 
> appreciated.
>
> Thanks for your time and patience,
> H. Wilson
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message