lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery
Date Thu, 21 Feb 2008 08:18:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570979#action_12570979
] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>This may be largely irrelevant, but Solr has a ConstantScorePrefixQuery which has
similar issues

No, very relevant. Only yesterday I had a user with exactly the same highlighting problem

>>it seems we prob shouldn't even keep it as configurable. Just drop it then?

My nightmare scenario is systems where people are using ConstantScoreRangeQuery in their queries
to do both latitude and longitude ranges over large areas - that's a lot of terms. I'd at
least want the option of NOT loading them all into RAM at once when highlighting.

Maybe we could look at having different highlight "matchers". The existing approach of keeping
a big bag of query terms becomes a "TermsMatcher" (simply looks up tokens in a HashSet of
terms), You can imagine a new "PrefixMatcher" which would examine tokens using "startsWith"
and a "RangeMatcher" examine tokens using just a start and end term. However, there's  a danger
we could end up re-implementing a lot of query logic so maybe the relevant queries/filters
could implement a "Matcher" interface to enable the same logic that is used when scanning
TermEnum at query time to be used by the Highlighter when looking at TokenStreams i,e. something
like this:
interface Matcher
{
   boolean matches(String value)
}
Needs some more thought yet but it could be an approach.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch,
spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch,
spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch,
spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch,
spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that
scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This
gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery.
New Query types are easy to add. There is also a new Fragmenter that attempts to fragment
without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message