lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1685) Make the Highlighter use SpanScorer by default
Date Sun, 02 Aug 2009 23:18:14 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-1685:
--------------------------------

    Attachment: LUCENE-1685.patch

Another rev making things a little easier.

QueryScorer now takes a TokenStream rather than a CachingTokenFilter - if there are any position
sensitive clauses, the TokenStream will be wrapped in a CachingTokenFilter if it is not already
a CachingTokenFilter.

This also removes having to call setTokenStream after constructing a QueryScorer and between
calls to getBestFragment - instead, the new init(TokenStream) that the Highlighter already
calls is used. This frees the user from having to make that call.

init(TokenStream) now can return a new TokenStream for the Highlighter to continue using (ie
the QueryScorer may return a CachingTokenFilter if their is a position sensitive clause in
the query) or null to keep using the same TokenStream.

Now you can use the SpanScorer (as QueryScorer now) the same way you could use the old QueryScorer
impl:

    QueryScorer scorer =  new QueryScorer(query, FIELD_NAME);
    Highlighter highlighter = new Highlighter(this,scorer);
    highlighter.setTextFragmenter(new SimpleFragmenter(40));
    
    for (int i = 0; i < hits.length(); i++) {
      String text = hits.doc(i).get(FIELD_NAME);
      TokenStream tokenStream = analyzer.tokenStream(FIELD_NAME, new StringReader(text));

      String result = highlighter.getBestFragments(tokenStream, text, maxNumFragmentsRequired,
          "...");
      System.out.println("\t" + result);
    }

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer
included with Lucene at all, so I was pretty much ready to move on after I it got in, rather
than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when
it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one
reason I havn't pushed for this change before. Might be best to actually make the switch in
3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the
large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty
much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a
query that is not position sensitive. Position sensitive query clauses will obviously be somewhat
slower to highlight, but that is because they will be highlighted correctly rather than ignoring
position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message