lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject RE: Displaying search context
Date Fri, 23 Sep 2005 14:25:21 GMT
> What are the
> current limitations of the
> Lucene Highlighter?  Does does it perform under high
> query load?  

The major bottlenecks are typically in retrieving
document content and then re-tokenizing with an
Analyzer - not the actual choice of highlighting code.

I've not used the Nutch summariser so I couldn't say
what you might expect in terms of a speed difference
to the highlighting stage. In terms of functionality,
from a quick glance at the code I would say it was
probably missing the following highlighter features:
* Choice of field (hardcoded to "content")
* Choice of Analyzer
* Re-ordering selected fragments to natural order
* Choice of markup (eg span vs <b>)
* Support for tokenStreams with overlapping tokens (eg
* Support for term weightings in fragment selection
(eg IDF)

The Nutch summarizer also looks to drag in
Nutch-specific classes too eg using Nutch's Query
object not Lucene's.

Currently both summarizers can mistakenly highlight
terms that are part of a phrase query where only one
term actually matches. This is less than ideal but the
solution requires a major rewrite of both
highlighter's logic.


To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message