lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-794) Beginnings of a span based highlighter
Date Sun, 04 Feb 2007 21:29:05 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470090
] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Looks like a good start, Mark - thanks for contributing this!

I've had a quick play and have identified the following issues:

1) Fieldname "contents" shouldn't be hardcoded into the Highlighter - different analyzers
can behave differently for different fields (see PerFieldAnalyzerWrapper). Either pass a fieldname
parameter or do as the existing highlighter does and take a TokenStream. The latter approach
has the advantage of being able to avoid re-analysis and make use of any stored TermVectors
(see TokenSources.java)
2) Analyzers which produce overlapping tokens (see Synonym analyzer in existing highlighter
Junit test) are problematic in the existing code. I remember the "TokenGroup" class in the
existing highlighter was an approach to help cater for these "overlap" scenarios.
3) Without wishing to resurrect the whole 1.4 vs 1.5 debate I beleive Lucene still targets
Java 1.4. 

To rectify these points it's not clear to me if it would be quicker to use your code or adapt
the existing highlighter code to use spans.
Thoughts?

Thanks, again,
Mark





 

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) that could
easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java,
HighlighterTest.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach
to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403
for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message