lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
Date Mon, 12 Mar 2007 19:34:09 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480175
] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>At a minimum, the Term fields could be set back to their original value after doing
the Span search..

Hmm. If the query is being reused in a multi-threaded server environment this wouldn't fly.

>>I really don't see how it is possible to ignore fields in another way though

I can think of one. Your current approach is based on modifying the query to suit the MemoryIndex
content. Another approach may be to modify the MemoryIndex content to suit the query. Your
code creates a MemoryIndex when presented with the text of a field. If it recognised it was
being used in "field-insensitive mode" it could extract the query terms and create a MemoryIndex
field for each unique fieldname in the set of query terms - using the same source text (a
CachedTokenStreamAnalyzer  could be used to avoid excessive tokenization of this text)
This approach would of course use some more memory but avoids the unpleasantness of changing
Query objects' contents.
I haven't fully considered the implications of this idea yet - initial thoughts?

Cheers
Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java,
DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java,
Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java,
HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java,
QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch,
spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java,
SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that
scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This
gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also
a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message