jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KÖLL Claus <C.KO...@TIROL.GV.AT>
Subject AW: AW: FullText Search Problem
Date Fri, 30 Nov 2007 10:57:24 GMT
hi marcel,

thanks for the informations
can you add your comments to the jira issue ?
https://issues.apache.org/jira/browse/JCR-1248

ok if try to run the query like this

//element(*, nt:base)[jcr:contains(., 'test\"!\"')]"

it works fine

but i think jackrabbit should handle the query properly if the sign is at the end ..

>>What I propose is to limit the set to only those that are really required. e.g. 
>>the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.

yes the cleary documenttation is often the problem :-)

>>This however means that you need to escape more than the specified set of 
>>characters.

should we add a UtilClass that handles this kind of escaping because we have ISO9075 that
handles filenames and ISO8601 that handles date/time things so  it would be fine
to encode search literals also

BR,
claus


-----Ursprüngliche Nachricht-----
Von: Marcel Reutegger [mailto:marcel.reutegger@gmx.net] 
Gesendet: Freitag, 30. November 2007 11:11
An: users@jackrabbit.apache.org
Betreff: Re: AW: FullText Search Problem


KÖLL Claus wrote:
> so either i will filter some characters from the search string or jackrabbit should handle
it.
> i think the second one will be better

JSR 170 specifies a set of characters that need to be escaped if one wishes to 
use them as literal instead of the semantics the spec gives them:

"Within the searchexp literal instances of single quote ("'"), double quote 
(""") and hyphen ("-") must be escaped with a backslash ("\"). Backslash itself 
must therefore also be escaped, ending up as double backslash ("\\")."

Jackrabbit extended this set to provide additional functionality. e.g. you can 
do a fuzzy search: test~

This however means that you need to escape more than the specified set of 
characters. Strictly speaking this is a violation of the spec. But without 
extending this set of characters additional functionality is very difficult to 
implement.

The current set of special characters that need escaping is:

"\\", "+", "-", "!", "(", ")", ":", "^", "[", "]", "\"", "{", "}", "~", "*", "?"

What I propose is to limit the set to only those that are really required. e.g. 
the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.

regards
  marcel

Mime
View raw message