jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "H. Wilson" <wils...@randdss.com>
Subject Re: Problems with hyphen in JSR-170 XPath query using jcr:contains
Date Mon, 30 Aug 2010 13:30:22 GMT
  Ard,

You are absolutely right.. and this didn't make sense to me either. I 
think I was too worn out from my week and too excited to have code that 
"worked" to notice the obvious... this must be a workaround. However, I 
will need a little guidance on how to inspect the tokens. I have Luke, 
but never really understood how to use it properly. Could you give me a 
clear list of steps, or point me to a resource I missed, on how I would 
go about inspecting tokens during insert/search? Thanks.

H. Wilson

On 08/30/2010 03:30 AM, Ard Schrijvers wrote:
> Hello,
>
> On Fri, Aug 27, 2010 at 9:06 PM, H. Wilson<wilsonh@randdss.com>  wrote:
>>   OK, well I got the spaces part figured out, and will post it for anyone who
>> needs it. Putting quotes around the spaces unfortunately did not work.
>>   During testing, I determined that if you performed the following query for
>> the exact fullName property:
>>
>>     filter.addContains ( @fullName,
>> '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Land"));
>>
>> It would return nothing. But tweak it a little and add a wildcard, and it
>> would return results:
>>
>>    filter.addContains ( @fullName,
>>    '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Lan*"));
> This does not make sense...see below
>
>> But since I did not want to throw in wild cards where they might not be
>> wanted, if a search string contained spaces, did not contain wild cards and
>> the user was not concerned with case sensitivity, I used the fn:lower-case.
>> So I ended up with the following excerpt (our clients wanted options for
>> case sensitive and case insensitive searching) .
>>
>> public OurParameter[] getOurParameters (boolean performCaseSensitiveSearch,
>> String searchTerm, String srchField ) { //srchField in this case was
>> fullName
>>
>>    .....
>>
>>    if ( performCaseSensitiveSearch) {
>>
>>        //jcr:like for case sensitive
>>        filter.orJCRExpression ("jcr:like(@" + srchField +",
>> '"+Text.escapeIllegalXpathSearchChars (searchTerm)+"')");
>>
>>    }
>>    else {
>>
>>        //only use fn:lower-case if there is spaces, with NO wild cards
>>
>>        if ( searchTerm.contains (" ")&&    !searchTerm.contains ("*")&&
>>   !searchTerm.contains ("?") ) {
>>
>>            filter.addJCRExpression ("fn:lower-case(@"+srchField+") =
>> '"+Text.escapeIllegalXpathSearchChars(searchTerm.toLowerCase())+"'");
>>
>>        }
>>
>>        else {
>>
>>            //jcr:contains for case insensitive
>>            filter.addContains ( srchField,
>> Text.escapeIllegalXpathSearchChars(searchTerm));
>>
>>        }
>>
>>    }
> This seems to me a workaround around the real problem, because, it
> just doesn't make sense to me. Can you inspect the tokens that are
> created by your analyser. Make sure you inspect the tokens during
> indexing (just store something) and during searching: just search in
> the property. I am quite sure you'll see the issue then. Perhaps
> something with Text.escapeIllegalXpathSearchChars though it seems that
> it should leave spaces untouched
>
> Regards Ard
>
>
>>    ....
>>
>> }
>>
>>
>> Hope that helps anyone who needs it.
>>
>> H. Wilson
>>
>>>> OK so it looks like I have one other issue. Using the configuration as
>>>> posted below and sticking to my previous examples, with the addition of
>>>> one
>>>> with whitespace. With the following three in our repository:
>>>>
>>>>    .North.South.East.WestLand
>>>>    .North.South.East.West_Land
>>>>    .North.South.East.West Land    //yes that's a space
>>>>
>>>> ...using a jcr:contains, with exact name search with NO wild cards: the
>>>> first two return properly, but the last one yields no result.
>>>>
>>>>    filter.addContains(@fullName,
>>>>
>>>> '"+org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(".North.South.East.West
>>>> Land") +"'));
>>> I think the space in a contains is seen as an AND by the
>>> Jackrabbit/Lucene QueryParser. I should test this however as I am not
>>> sure. Perhaps you can put quotes around it, not sure if that works
>>> though
>>>
>>> Regards Ard
>>>
>>>> According to the Lucene documentation, KeywordAnalyzer should be creating
>>>> one token, plus combined with escaping the Illegal Characters (i.e.
>>>> spaces),
>>>> shouldn't this search work? Thanks again.
>>>>
>>>> H. Wilson

Mime
View raw message