jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "H. Wilson" <wils...@randdss.com>
Subject jcr:contains with wildcards and underscores
Date Fri, 04 Jun 2010 13:21:59 GMT
Hello,

I am using Jackrabbit 2.0 with OCM and after searching forums both here 
and on Lucene, as well as Google, I have yet to find an answer. (On an 
aside, if this question should have gone to the Lucene user's list, 
please let me know!).

For starters, you should know our clients would like both case-sensitive 
and case-insensitive options available to them. The searches are to be 
on a property named fullName, which may contain underscores and always 
contains a leading dot. (Also our client's requirement.) And while yes, 
we are aware that leading wildcard searches are not the best, the client 
still plans to use them. Here is my issue:

    * My searches using jcr:like work fine for all the scenarios I list
      below.
    * My searches with jcr:contains and exact names work fine (even with
      underscores!).
    * My jcr:contains searches using wildcards and underscores always
      fail. I have even tried escaping them.

Given there are two objects in our repository with the following 
fullName properties:

    .North.South.East.WestLand
    .North.South.East.West_Land


Both of the following work fine, and each return the respective object:

    (jcr:contains(@fullName, '.North.South.East.WestLand'))
    (jcr:contains(@fullName, '.North.South.East.West_Land'))


The following jcr:contains queries return BOTH objects successfully:

    *North*
    .North*
    .North.*

The following queries successfully return the FIRST object:

    *.South.East.WestLand
    .*.South.East.WestLand
    *South*.WestLand
    *East.WestLand
    *.WestLand
    *East?WestLand
    *?WestLand
    *North.South.East.WestLand

And the following identical jcr:contains queries (except the underscore) 
do not return anything, when I would expect the SECOND Object:

    *.South.East.West_Land
    .*.South.East.West_Land
    *South*.West_Land
    *East.West_Land
    *.West_Land
    *East?West_Land
    *?West_Land
    *North.South.East.West_Land

UPDATE: After I wrote this large message, I just remembered something. 
(It should be noted - I have been trying to tackle this off and on for 
weeks, please bear with the slight memory loss, but maybe having seen 
all this will help others.) I remember reading somewhere that Lucene 
treats underscores as token dividers. So when I have Object properties 
with underscores, it is splitting it into Tokens and essentially 
dropping the underscore completely. Which could explain why exact name 
search works. (Is this correct?) The above examples were using the 
StandardAnalyzer. I have previously tried using the WhitespaceAnalyzer, 
but doing so disables my ability to do leading wildcard searches, which 
is absolutely required by our clients. I know there is a way to turn on 
the leading wild card searches, but I could not gather how to do it 
while using JackRabbit. Any advice on a way to use any Analyzer which 
would satisfy our clients would be GREATLY appreciated.

Thanks for your time and patience,
H. Wilson


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message