lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <>
Subject [jira] Commented: (LUCENE-794) Beginnings of a span based highlighter
Date Mon, 05 Feb 2007 19:44:05 GMT


Mark Harwood commented on LUCENE-794:

>>Sorry about all that Mark H
No need for any apologies - all help is gratefully received!
I don't mean to criticise your efforts or seem picky - I just wanted to record my findings
somewhere useful if we were to consider working a solution up from this "test code" rather
than tweaking the current highlighter - I'm still uncertain about the best approach. I also
thought it might be useful to point the potential issues out to you if you were already reliant
on using this code somewhere.

>>I need to read the TokenStream at least twice
>>I used the horribly hackey but quick-for-me method of adding a method to MemoryIndex
that accepts a List of Tokens. Any ideas? 

I'm not sure about modifying MemoryIndex. It should be easy enough to create a subclass of
TokenStream - ("CachedTokenStream" perhaps?) which takes a real TokenStream in it's constructor
and delegates all "next" calls to it (and also records them in a List) for the the first use.
This can then be "rewound" and re-used to run through the same set of tokens held in the list
 from the first run.

>>if position increment equals 0 skip printing out the token...but I am not totally
confident it is perfect yet. 

I think it's possible some of the more Byzantine analyzers may have a position increment >0
but overlap in terms of their byte offsets. I'd need to check the old Junit tests to be sure
on this. Welcome to my hell!

Thanks again for your help.
Mark H

> Beginnings of a span based highlighter
> --------------------------------------
>                 Key: LUCENE-794
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments:,,,,,,,,,
> This is some test code to start the work of adding a span based highlighting approach
to the existing highlighter in contrib. See
for some background.
> There is a dependency on MemoryIndex.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message