lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <>
Subject Re: highlighting performance
Date Tue, 21 Jun 2011 00:21:39 GMT

FVH used to be faster for large docs. I wrote FVH section for Lucene in Action and it said:

In contrib/benchmark (covered in appendix C), there’s an algorithm
file called highlight-vs-vector-highlight.alg that lets you see the difference
between two highlighters in processing time. As of version 2.9, with modern hardware,
that algorithm shows that FastVectorHighlighter is about two and a half times faster
than Highlighter.

The number was for Lucene 2.9 age and Wikipedia data, maybe different today.

Anyway, thank you for sharing interesting result!


(11/06/21 5:20), Mike Sokolov wrote:
> Our apps use highlighting, and I expect that highlighting is an expensive operation since
> requires processing the text of the documents, but I ran a test and was surprised just
how expensive
> it is. I made a test index with three fields: path, modified, and contents. I made the
index using
> org.apache.lucene.demo.IndexFiles modified so that the contents field is stored and analyzed:
> doc.add(new Field("contents", false, buf.toString(),
> There are about 8000 documents in the index, and the contents field averages around 7500
bytes. The
> total index directory size is about 242M.
> I ran a modified version of the demo.SearchFiles class that doesn't print anything out
> results takes most of the time for faster queries), and runs random queries drawn from
the text of
> the documents: these are a mix of (mostly) term queries, and about 20% phrase queries
(that are
> phrases from the text).
> I compared a few cases: no field access, un-highlighted retrieval, highlighting, Highlighter
> FastVectorHighlighter, always asking for 10 top scoring docs per query, and running at
least 1000
> queries for each case.
> No field access at all gets about 7000 qps; basically we just call,
> Then there is a big cost for retrieving the stored documents from the index:
> Retrieving each document (calling search.doc(docID)) and the path field only (a small
field) gets
> about 250 qps
> As a comparison, if I don't store the contents field in the index (and don't retrieve
it at all), I
> get similar performance to the no retrieval case (around 7000 qps). OK - so there is
a fair amount
> of I/O required to retrieve the stored doc; this may be unavoidable, although do consider
that for
> highlighting only a small portion of the doc may ultimately be required.
> Then another big penalty is paid for highlighting:
> Highlighter gets about 60 qps
> And finally I am really mystified about this one:
> FastVectorHighlighter gets about 20 qps. There is a lot of variance here (say 9-44 qps),
> always worse than Highlighter.
> If these results hold up I'll be astonished, since they imply:
> (1) FVH is not fast
> (2) Highlighting consumes most processing time (around 80%) in the best case, as compared
to just
> retrieving un-highlighted documents.
> and the follow on is that at least for users that need highlighting, there is hardly
any point in
> optimizing anything else!
> I thought maybe FVH required a lot of memory, so I changed the -Xmx512m (from the default:
64m I
> think), but this had no effect.
> I also tried optimizing the index, and although this improved query performance somewhat
across the
> board, it actually accentuated the cost of highlighting since the most marked improvement
was in the
> basic unhighlighted query.
> Here is what the highlighting looks like:
> For FVH we allocate a single SimpleFragsListBuilder, SimpleFragmentBuilder, preTags[1],
> and DefaultEncoder so these don't have to be created for each query. We also cache the
> FastVectorHighlighter itself, and we call:
> highlighter.getBestFragment(highlighter.getFieldQuery(query), searcher.getIndexReader(),
> hits[i].doc, "contents", 40, flb, fb, preTags, postTags, encoder);
> once for each result.
> In the Highlighter case, we also cache the Highlighter and call:
> highlighter.getBestFragment(analyzer, "contents", doc.get("contents"));
> does this performance profile match up with your expectations? Did I do something stupid?
Please let
> me know if I can provide more info. I'm considering what can be done to speed up highlighting,
> don't want to go off half-cocked..

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message