lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Sokolov (JIRA)" <>
Subject [jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries
Date Mon, 27 Jun 2011 13:41:48 GMT


Mike Sokolov commented on LUCENE-1889:

Robert: Thanks that sounds like good advice. I wasn't completely happy with that Pattern list
anyway; really still just feeling my way around Lucene and trying random things at this point
a bit.  I wonder if you could comment on this possible other idea, following up on Mike M's
quote above:

I tried hacking up SpanScorer to see if I could get positions out of it using a custom Collector,
but found that by the time a doc was reported, SpanScorer had already iterated over and dropped
the positions.  I was thinking of adding a Collector.collectSpans(int start, int end), and
having SpanScorer call it (it would be an empty function in Collector proper) or something
like that.  At this point I'm wondering if it might be possible to rewrite many queries as
some kind of SpanQuery (using a visitor), without the need to actually alter all the Query
implementations.  Is there a better way?

I was also thinking it might be possible to capture and re-use positions gathered during the
initial scoring episode rather than having to re-score during highlighting, but I guess that's
a separate issue.

Koji: Thanks for the review, but it sounds like some more iteration is needed here; for sure
on RegExpQuery.  I probably should have tested that a bit more carefully, although the one
thing I tried (character classes) seems to work the same.

> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>                 Key: LUCENE-1889
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: modules/highlighter
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: LUCENE-1889.patch
> I am using fastvectorhighlighter for some strange languages and it is working well! 
> One thing i noticed immediately is that many query types are not highlighted (multitermquery,
multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but specialized to
our case).
> I am doing strange things like forcing multitermqueries to rewrite into boolean queries
so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling
getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message