lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <alan.woodw...@romseysoftware.co.uk>
Subject Using term offsets for hit highlighting
Date Mon, 19 Mar 2012 14:38:25 GMT
Hello,

The project I'm currently working on requires the reporting of exact hit positions from some
pretty hairy queries, not all of which are covered by the existing highlighter modules.  I'm
working round this by translating everything into SpanQueries, and using the getSpans() method
to locate hits (I've extended the Spans interface to make term offsets available - see https://issues.apache.org/jira/browse/LUCENE-3826).
 This works for our use-case, but isn't terribly efficient, and obviously isn't applicable
to non-Span queries.

I've seen a bit of chatter on the list about using term offsets to provide accurate highlighting
in Lucene.  I'm going to have a couple of weeks free in April, and I thought I might have
a go at implementing this.  Mainly I'm wondering if there's already been thoughts about how
to do it.  My current thoughts are to somehow extend the Weight and Scorer interface to make
term offsets available; to get highlights for a given set of documents, you'd essentially
run the query again, with a filter on just the documents you want highlighted, and have a
custom collector that gets the term offsets in place of the scores.

All pointers gratefully received!

Thanks,

Alan Woodward
Mime
View raw message