jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: How to handle the colon character within fulltext search?
Date Fri, 25 Jun 2010 12:19:32 GMT
Hello Gary,

in the end, the part in the contains function gets delegated to the
Lucene QueryParser. So, you can use Lucene query syntax in contains,
for example query time boosting like 'myterm^10'  (unless it does not
get swallowed by the xpath/sql parser of jackrabbit, like the ~ fuzzy

Anyways, a colon means in lucene query parser that you search within a
specific field, see [1] at *Fields*

At the end of that page, it is explained how to escape special chars ( use \ )

However, prefixing is again with a wildcard does not seem to work when
I test it: I did not test it directly against lucene, so hard to say
whether this is a lucene queryparser constraint in combination with
query expansion for the wildcard or a jackrabbit issue.

That said, I think in the end you do not want to use the prefix
wildcard anyways: You'll run into terrible performance and memory
useage problems: A general inverted indexes problem (which you can
circumvent by indexing every term inverted as well...but that is not
done by jackrabbit of course)

Anyways, the working solution to your problem is to use 'like'. You
are not doing a free text search actually (free text is on lucene
terms, not on sentences)

The xpath equivalent that works is for example:

//*[jcr:like(@myprop, 'my:colon having sentence')]

Though again, the jcr:like has bad scaling wrt performance and memory

Regards Ard

[1] http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

On Fri, Jun 25, 2010 at 1:59 PM, Gary Long <long@magillem.com> wrote:
> Le 25/06/2010 12:17, Alexander Klimetschek a écrit :
>> On Fri, Jun 25, 2010 at 11:42, Gary Long<long@magillem.com>  wrote:
>>> Hello there :)
>>> I'm using the fulltext search feature of Jackrabbit and i'm facing a
>>> little
>>> problem with the colon character (:). For example, if I search for a mail
>>> which subject is "Tr : Tr : your response", I can't find it. If I search
>>> for
>>> "your response" the e-mail is found.
>>> my sql query is :
>>> SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*tr: tr: your
>>> response*') OR contains(jcr:name, '*tr: tr: your response*');
>> You should escape the query for the contains/jcr:contains function
>> using the Text.escapeIllegalXpathSearchChars helper from
>> jackrabbit-jcr-commons:
>> http://wiki.apache.org/jackrabbit/EncodingAndEscaping#Escaping_values_in_queries
>> Regards,
>> Alex
> I tried this method but it didn't do anything : /
> Here is my code :
> String param = "Tr: Tr: your response";
> String escapedParam =
> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(param);
> String query = SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*"+
> escapedParam +"*') OR contains(jcr:name, '*"+ escapedParam +"*').
> In debug mode, I looked at the value of textQuery in the query and it is
> still "Tr: Tr your response". The colon character doesn't seems to be
> escaped. : /
> Regards,
> Gary

View raw message