lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default
Date Fri, 12 Jun 2009 10:02:07 GMT


Michael McCandless commented on LUCENE-1685:

Why not deprecate QueryScorer?  It's buggy, and leaving it in there, with such a juicy name,
looking like the right choice, just makes Lucene's (highlighter's) quality look bad.  Correctness
trumps performance.

And then the javadocs should clearly favor SpanScorer... and I would include a clear code
fragment showing how to use it all, in context.  EG this is what LIA2 currently has, which
is fine to copy/modify/etc. to get into the javadocs:

  public void testHits() throws Exception {
    IndexSearcher searcher = new IndexSearcher(TestUtil.getBookIndexDirectory());
    TermQuery query = new TermQuery(new Term("title", "action"));
    TopDocs hits =, 10);

    Highlighter highlighter = new Highlighter(null);
    Analyzer analyzer = new SimpleAnalyzer();
    for (int i = 0; i < hits.scoreDocs.length; i++) {
      Document doc = searcher.doc(hits.scoreDocs[i].doc);
      String title = doc.get("title");

      TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
      SpanScorer scorer = new SpanScorer(query, "title",
                                         new CachingTokenFilter(stream));
      Fragmenter fragmenter = new SimpleSpanFragmenter(scorer);

      String fragment =
          highlighter.getBestFragment(stream, title);


It would also be nice to simplify that usage, eg, is there some way to not have to make a
SpanScorer (and, by extension, fragmenter) per query, but instead make it up-front and add
a setter for the new TokenStream for each doc?  (Having to create Highlighter(null) is awkward).
 Or I suppose we could simply make a new Highlighter, SpanScorer, SimpleSpanFragmenter per-hit,
but that seems wasteful.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>                 Key: LUCENE-1685
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer
included with Lucene at all, so I was pretty much ready to move on after I it got in, rather
than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when
it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one
reason I havn't pushed for this change before. Might be best to actually make the switch in
3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the
large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty
much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a
query that is not position sensitive. Position sensitive query clauses will obviously be somewhat
slower to highlight, but that is because they will be highlighted correctly rather than ignoring

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message