lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default
Date Fri, 12 Jun 2009 10:02:07 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718773#action_12718773
] 

Michael McCandless commented on LUCENE-1685:
--------------------------------------------

Why not deprecate QueryScorer?  It's buggy, and leaving it in there, with such a juicy name,
looking like the right choice, just makes Lucene's (highlighter's) quality look bad.  Correctness
trumps performance.

And then the javadocs should clearly favor SpanScorer... and I would include a clear code
fragment showing how to use it all, in context.  EG this is what LIA2 currently has, which
is fine to copy/modify/etc. to get into the javadocs:

{code}
  public void testHits() throws Exception {
    IndexSearcher searcher = new IndexSearcher(TestUtil.getBookIndexDirectory());
    TermQuery query = new TermQuery(new Term("title", "action"));
    TopDocs hits = searcher.search(query, 10);

    Highlighter highlighter = new Highlighter(null);
    Analyzer analyzer = new SimpleAnalyzer();
    
    for (int i = 0; i < hits.scoreDocs.length; i++) {
      Document doc = searcher.doc(hits.scoreDocs[i].doc);
      String title = doc.get("title");

      TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
                                                          hits.scoreDocs[i].doc,
                                                          "title",
                                                          doc,
                                                          analyzer);
      SpanScorer scorer = new SpanScorer(query, "title",
                                         new CachingTokenFilter(stream));
      Fragmenter fragmenter = new SimpleSpanFragmenter(scorer);
      highlighter.setFragmentScorer(scorer);
      highlighter.setTextFragmenter(fragmenter);

      String fragment =
          highlighter.getBestFragment(stream, title);

      System.out.println(fragment);
    }
  }
{code}

It would also be nice to simplify that usage, eg, is there some way to not have to make a
SpanScorer (and, by extension, fragmenter) per query, but instead make it up-front and add
a setter for the new TokenStream for each doc?  (Having to create Highlighter(null) is awkward).
 Or I suppose we could simply make a new Highlighter, SpanScorer, SimpleSpanFragmenter per-hit,
but that seems wasteful.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer
included with Lucene at all, so I was pretty much ready to move on after I it got in, rather
than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when
it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one
reason I havn't pushed for this change before. Might be best to actually make the switch in
3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the
large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty
much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a
query that is not position sensitive. Position sensitive query clauses will obviously be somewhat
slower to highlight, but that is because they will be highlighted correctly rather than ignoring
position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message