lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Re: Fix for "advanced tokenizers and highlighter" problem
Date Wed, 23 Jun 2004 00:24:37 GMT wrote:
> I think this version of the highlighter should provide a fix:
> Before I update the version of the highlighter in the sandbox I'd appreciate feedback
from those troubled 
> with the issues to do with overlapping tokens in token streams (Erik, Dave, Bruce?)

1st pass of testing - yes, this does indeed fix the problem.
I've realized I may want to modify my Analyer now too.
I was focusing on the Token position increment instead of the offset.
For something like the case where I broken "HashMap" into 3 tokens: 
"Hash", "Map", "HashMap", I was returning the same start/end offsets for 
  all of them (thus a search on "Map" ends up with all of "HashMap" 
being highlighted). Probably more correct is to return offsets within 
the orig larger token so that you can see exactly where your term 
matched. I'll update my code and then put up a site that demonstrates this.


> I added my own test analyzer to the Junit test that introduces synonyms into the token
stream at the same
> position as the trigger token and the new code works OK for me with that analyzer.
> The fix means I needed to change the Formatter interface - this now takes a "TokenGroup"
object instead 
> of a token because that can be used to represent a single token OR a sequence of overlapping
> I dont think most people have needed to create custom Formatter implementations so I
dont think this
> redefined interface should break too much existing code (if any).
> Cheers
> Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message