Phrase highlighting (and spans) would certainly be useful, as would multi-field. Before we leap into adding code into the highlighter though I think it's worth considering what we are trying to fix here in a more general sense. As a basic principle I think highlighting should attempt to show the user what the search engine saw as important in the document. With that principle in mind I should really make sure that if I search for: ("Doug Cutting" AND lucene) OR google I shouldn't highlight "Doug Cutting" in a matching document that has google but not lucene. If we are going to try to be true to representing the query logic in our display we end up having to re-implement a lot of the query logic in the highlighter eg taking account of slop factors etc We could avoid over-complicating the highlighter in this way if the different queries could provide information of use in highlighting - a variant of the "explain" function that would describe not only the scoring but the sections of the document to which these scores relate. Does this approach sound feasible? > There's a post over at SearchEngineWatch theorizing about how Google > produces summaries. > > http://forums.searchenginewatch.com/showthread.php?threadid=5448 > > Lucene's current highlighter doesn't easily support multi-fields, nor > does it take phrasal matching into account. It might be useful to > have a highligher API that takes a Document and summarizes all of its > fields, incorporating their boosts in fragment scores. Thoughts? > > Doug > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org