lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <>
Subject [jira] Created: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.
Date Tue, 15 Jun 2010 20:17:25 GMT
Highlighter component should expose snippet character offsets and the score.

                 Key: SOLR-1954
             Project: Solr
          Issue Type: New Feature
          Components: highlighter
            Reporter: David Smiley
            Priority: Minor

The Highlighter Component does not currently expose the snippet character offsets nor the
score.  There is a TODO in DefaultSolrHighlighter indicating the intention to add this eventually.
 This information is needed when doing highlighting on external content.  The data is there
so its pretty easy to output it in some way.  The challenge is deciding on the output and
its ramifications on backwards compatibility.  The current highlighter component response
structure doesn't lend itself to adding any new data, unfortunately.  I wish the original
implementer had some foresight.  Unfortunately all the highlighting tests assume this structure.
 Here is a snippet of the current response structure in Solr's sample data searching for "sdram"
for reference:
<lst name="highlighting">
 <lst name="VS1GB400C3">
  <arr name="text">
	<str>CORSAIR ValueSelect 1GB 184-Pin DDR &lt;em&gt;SDRAM&lt;/em&gt;
Unbuffered DDR 400 (PC 3200) System Memory - Retail</str>


Perhaps as a little hack, we introduce a pseudo field called text_startCharOffset which is
the concatenation of the matching field and "_startCharOffset".  This would be an array of
ints.  Likewise, there would be another array for endCharOffset and score.


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message