jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Problems with hyphen in JSR-170 XPath query using jcr:contains
Date Mon, 30 Aug 2010 13:38:42 GMT
On Mon, Aug 30, 2010 at 3:30 PM, H. Wilson <wilsonh@randdss.com> wrote:
>  Ard,
>
> You are absolutely right.. and this didn't make sense to me either. I think
> I was too worn out from my week and too excited to have code that "worked"
> to notice the obvious... this must be a workaround. However, I will need a
> little guidance on how to inspect the tokens. I have Luke, but never really
> understood how to use it properly. Could you give me a clear list of steps,
> or point me to a resource I missed, on how I would go about inspecting
> tokens during insert/search? Thanks.

I'd just print them to your console with Token#term() or use a
debugger . If you do that during indexing and searching, I think you
must see some difference in the token that explains *why* Lucene
doesn't find a hit for your usecase with spaces.

Luke is hard to use for the multi-index jackrabbit indexing, as well
as the field value prefixing: It is unfortunate and not completely
necessary any more but has some historical reasons from Lucene back in
the days when it could not handle very many unique fieldnames

Regards Ard

>
> H. Wilson
>
> On 08/30/2010 03:30 AM, Ard Schrijvers wrote:
>>
>> Hello,
>>
>> On Fri, Aug 27, 2010 at 9:06 PM, H. Wilson<wilsonh@randdss.com>  wrote:
>>>
>>>  OK, well I got the spaces part figured out, and will post it for anyone
>>> who
>>> needs it. Putting quotes around the spaces unfortunately did not work.
>>>  During testing, I determined that if you performed the following query
>>> for
>>> the exact fullName property:
>>>
>>>    filter.addContains ( @fullName,
>>> '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Land"));
>>>
>>> It would return nothing. But tweak it a little and add a wildcard, and it
>>> would return results:
>>>
>>>   filter.addContains ( @fullName,
>>>   '"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Lan*"));
>>
>> This does not make sense...see below
>>
>>> But since I did not want to throw in wild cards where they might not be
>>> wanted, if a search string contained spaces, did not contain wild cards
>>> and
>>> the user was not concerned with case sensitivity, I used the
>>> fn:lower-case.
>>> So I ended up with the following excerpt (our clients wanted options for
>>> case sensitive and case insensitive searching) .
>>>
>>> public OurParameter[] getOurParameters (boolean
>>> performCaseSensitiveSearch,
>>> String searchTerm, String srchField ) { //srchField in this case was
>>> fullName
>>>
>>>   .....
>>>
>>>   if ( performCaseSensitiveSearch) {
>>>
>>>       //jcr:like for case sensitive
>>>       filter.orJCRExpression ("jcr:like(@" + srchField +",
>>> '"+Text.escapeIllegalXpathSearchChars (searchTerm)+"')");
>>>
>>>   }
>>>   else {
>>>
>>>       //only use fn:lower-case if there is spaces, with NO wild cards
>>>
>>>       if ( searchTerm.contains (" ")&&    !searchTerm.contains ("*")&&
>>>  !searchTerm.contains ("?") ) {
>>>
>>>           filter.addJCRExpression ("fn:lower-case(@"+srchField+") =
>>> '"+Text.escapeIllegalXpathSearchChars(searchTerm.toLowerCase())+"'");
>>>
>>>       }
>>>
>>>       else {
>>>
>>>           //jcr:contains for case insensitive
>>>           filter.addContains ( srchField,
>>> Text.escapeIllegalXpathSearchChars(searchTerm));
>>>
>>>       }
>>>
>>>   }
>>
>> This seems to me a workaround around the real problem, because, it
>> just doesn't make sense to me. Can you inspect the tokens that are
>> created by your analyser. Make sure you inspect the tokens during
>> indexing (just store something) and during searching: just search in
>> the property. I am quite sure you'll see the issue then. Perhaps
>> something with Text.escapeIllegalXpathSearchChars though it seems that
>> it should leave spaces untouched
>>
>> Regards Ard
>>
>>
>>>   ....
>>>
>>> }
>>>
>>>
>>> Hope that helps anyone who needs it.
>>>
>>> H. Wilson
>>>
>>>>> OK so it looks like I have one other issue. Using the configuration as
>>>>> posted below and sticking to my previous examples, with the addition
of
>>>>> one
>>>>> with whitespace. With the following three in our repository:
>>>>>
>>>>>   .North.South.East.WestLand
>>>>>   .North.South.East.West_Land
>>>>>   .North.South.East.West Land    //yes that's a space
>>>>>
>>>>> ...using a jcr:contains, with exact name search with NO wild cards: the
>>>>> first two return properly, but the last one yields no result.
>>>>>
>>>>>   filter.addContains(@fullName,
>>>>>
>>>>>
>>>>> '"+org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(".North.South.East.West
>>>>> Land") +"'));
>>>>
>>>> I think the space in a contains is seen as an AND by the
>>>> Jackrabbit/Lucene QueryParser. I should test this however as I am not
>>>> sure. Perhaps you can put quotes around it, not sure if that works
>>>> though
>>>>
>>>> Regards Ard
>>>>
>>>>> According to the Lucene documentation, KeywordAnalyzer should be
>>>>> creating
>>>>> one token, plus combined with escaping the Illegal Characters (i.e.
>>>>> spaces),
>>>>> shouldn't this search work? Thanks again.
>>>>>
>>>>> H. Wilson
>

Mime
View raw message