lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries
Date Mon, 27 Jun 2011 10:22:47 GMT


Robert Muir commented on LUCENE-1889:

A possible issue is that regex support will differ from RegexpQuery, but I think? that Java's
is a superset, so should be ok, but I'm not sure about this one.

Actually, these are totally different syntaxes!

An alternative way to flatten these multitermqueries could be to implement o.a.l.index.Terms
with what is in the term vector... then you could rewrite them with their own code.

trying to generate an equivalent string pattern could be a little problematic, for example
wildcard supports escaped terms (and could contain other characters that are java.util.regex
syntax characters but not wildcard syntax characters), the regex syntax is different, etc.

if you still decide you want to do it this way though, i would use o.a.l.util.automaton instead
of java.util.regex? Besides being faster, this is internally what these queries are using
anyway, so you can convert them with for example WildcardQuery.toAutomaton(). Then, union
these and match against the union'ed machine instead of a List.

But personally i would look at going the Terms/rewriteMethod route if possible, this way all
multitermqueries will "just work".

> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>                 Key: LUCENE-1889
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: modules/highlighter
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: LUCENE-1889.patch
> I am using fastvectorhighlighter for some strange languages and it is working well! 
> One thing i noticed immediately is that many query types are not highlighted (multitermquery,
multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but specialized to
our case).
> I am doing strange things like forcing multitermqueries to rewrite into boolean queries
so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling
getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message