lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Hatcher (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1954) Highlighter component should expose snippet character offsets and the score.
Date Fri, 18 Jun 2010 14:50:23 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880214#action_12880214
] 

Erik Hatcher commented on SOLR-1954:
------------------------------------

No, we're not talking about the same thing.   Here's what I'm suggesting:

{code}
{
  'responseHeader'=>{
    'status'=>0,
    'QTime'=>15},
  'response'=>{'numFound'=>3,'start'=>0,'maxScore'=>0.10558263,'docs'=>[
      {
        'id'=>'IW-02',
        'name'=>'iPod & iPod Mini USB 2.0 Cable',
        'manu'=>'Belkin',
        'weight'=>2.0,
        'price'=>11.5,
        'popularity'=>1,
        'inStock'=>false,
        'store_0_d'=>37.7752,
        'store_1_d'=>-122.4232,
        'store'=>'37.7752,-122.4232',
        'manufacturedate_dt'=>'2006-02-14T23:55:59Z',
        'cat'=>[
          'electronics',
          'connector'],
        'features'=>[
          'car power adapter for iPod, white'],
        'score'=>0.10558263}]
  },
  'facet_counts'=>{
    'facet_queries'=>{},
    'facet_fields'=>{
      'cat'=>[
        'electronics',3,
        'connector',2,
        'music',1],
      'manu_exact'=>[
        'Belkin',2,
        'Apple Computer Inc.',1]},
    'facet_dates'=>{}},
  'highlighting'=>{
    'IW-02'=>{
      'features'=>['car power adapter for <em>iPod</em>, white'],
      'name'=>['<em>iPod</em> & <em>iPod</em> Mini USB 2.0
Cable']}},
  'highlighting-extended-info'=>{
    'IW-02'=>{
      'text_startPos'=>[5]
  },
  'spellcheck'=>{
    'suggestions'=>[]}}
{code}

That way the highlighting section remains untouched, with extra stuff in a 'highlighting-extended-info'
(let's use a shorter name though) section as a direct child of the root response, just like
'highlighting' is.  


> Highlighter component should expose snippet character offsets and the score.
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1954
>                 URL: https://issues.apache.org/jira/browse/SOLR-1954
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>            Reporter: David Smiley
>            Priority: Minor
>         Attachments: SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character offsets nor
the score.  There is a TODO in DefaultSolrHighlighter indicating the intention to add this
eventually.  This information is needed when doing highlighting on external content.  The
data is there so its pretty easy to output it in some way.  The challenge is deciding on the
output and its ramifications on backwards compatibility.  The current highlighter component
response structure doesn't lend itself to adding any new data, unfortunately.  I wish the
original implementer had some foresight.  Unfortunately all the highlighting tests assume
this structure.  Here is a snippet of the current response structure in Solr's sample data
searching for "sdram" for reference:
> {code:xml}
> <lst name="highlighting">
>  <lst name="VS1GB400C3">
>   <arr name="text">
> 	<str>CORSAIR ValueSelect 1GB 184-Pin DDR &lt;em&gt;SDRAM&lt;/em&gt;
Unbuffered DDR 400 (PC 3200) System Memory - Retail</str>
>   </arr>
>  </lst>
> </lst>
> {code}
> Perhaps as a little hack, we introduce a pseudo field called text_startCharOffset which
is the concatenation of the matching field and "_startCharOffset".  This would be an array
of ints.  Likewise, there would be another array for endCharOffset and score.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message