jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Parvulescu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (JCR-3478) Partial search terms matching fails when there is a lot of matching content outside the query's scope
Date Fri, 07 Dec 2012 09:41:21 GMT
Alex Parvulescu created JCR-3478:
------------------------------------

             Summary: Partial search terms matching fails when there is a lot of matching
content outside the query's scope
                 Key: JCR-3478
                 URL: https://issues.apache.org/jira/browse/JCR-3478
             Project: Jackrabbit Content Repository
          Issue Type: Bug
          Components: jackrabbit-core
            Reporter: Alex Parvulescu


This continues the work from JCR-3428.

It appears that if we are dealing with a full-text search 'ipsu*', the WildcardQueryRewrite
will generate a list of matching tokens to use as the query condition based on all of the
matching tokens found in the index, not just the ones that fall into the query's scope.

This list will next be used in the Excerpt generation, with a 'must all match' condition,
which will make the excerpts not work.

For example if we have the following content:
/
  /testNode1 with the property 'text'='lorem ipsum'
  /testNode2 with the property 'foo'='ipsuFoo'
  /testNode3 with the property 'bar'='ipsuBar'

and the query testNode1//*[jcr:contains(., 'ipsu*')]/rep:excerpt(.)

What will happen is the WildcardQueryRewrite will extract 3 terms for the highlighter: ipsum,
ipsuFoo and ipsuBar, wich will be passed as a single list of terms, basically a 'must all
match' condition.

What I want to do is break this list into a list of 3 sets each containing a single term,
turning it into a 'match any' type of condition. 

The interesting part here is that in order to preserve the existing functionality for the
japanese language as well (where a work can be comprised of more tokens that are passed around
via a PhraseQuery) I'm going to explicitly check and transform PhraseQuery tokens into a 'must
all match' list of tokens.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message