jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-820) Add support for query result highlighting
Date Wed, 28 Mar 2007 10:10:32 GMT

    [ https://issues.apache.org/jira/browse/JCR-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484794
] 

Marcel Reutegger commented on JCR-820:
--------------------------------------

Committed initial version: 523251

The query languages now support an excerpt function that returns highlighted fragments for
the current node in a result row.

The excerpt is a simple XML fragment. An example fragment could look like this for the query
terms 'jackrabbit' and 'query':

<excerpt>
     <fragment>
          <highlight>Jackrabbit</highlight> implements both the mandatory XPath
and optional SQL
          <highlight>query</highlight> syntax.
     </fragment>
     <fragment>
          Before parsing the XPath <highlight>query</highlight> in <highlight>Jackrabbit</highlight>,
          the statement is surrounded
     </fragment>
 </excerpt>

Example queries:

//element(nt:resource)[jcr:contains(., 'jackrabbit')]/rep:excerpt(.)

select excerpt(.) from nt:resource where contains(., 'jackrabbit')

Per default the excerpt function returns only simple fragments without highlight elements
because additional token offset information needs to be indexed for highlighting. To enable
term highlighting a configuration parameter needs to be set:

<param name="supportHighlighting" value="true"/>

Per default this is set to false for performance reasons. When set to true the values of string
properties and the text extract of binary properties are stored in the lucene index. Because
in lucene all stored fields are loaded when a document is requested this affects performance.
With lucene 2.1 this behaviour can be controlled and only specified fields can be loaded.
Once jackrabbit switches to lucene 2.1 the query handler should only read stored fulltext
extract when really needed.

Similarly when switching to lucene 2.1, jackrabbit should have a custom field implementation
that allows to store a field with a reader value. Currently when highlighting is enabled deferred
text extraction is effectively disabled. With a custom field implementation deferred text
extraction will work again even if highlighting is enabled.

> Add support for query result highlighting
> -----------------------------------------
>
>                 Key: JCR-820
>                 URL: https://issues.apache.org/jira/browse/JCR-820
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: query
>            Reporter: Marcel Reutegger
>            Priority: Minor
>
> Highlighting matches in a query result list is regularly needed for an application. The
query languages should support a pseudo property or function that allows one to retrieve text
fragments with highlighted matches from the content of the matching node.
> To support this feature the following enhancements are required:
> - define a pseudo property or function that returns the text excerpt and can be used
in the select clause
> - the index needs to *store* the original text it used when the node was indexed. this
also includes extracted text from binary properties.
> - text fragments must be created based on the original text, the query and index information

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message