lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: text highlighting problem
Date Mon, 12 Mar 2007 20:41:26 GMT

:    MyCanonizer textCanonizer = new MyCanonizer();
:    TokenStream ts = new
: StandardTokenizer(textCanonizer.peformCanonization(reader));
:    return ts;

: Could anybody say why the highlights are shifted and/or how to solve the
: problem ?

the highlights are shifted because hte positions hte highlighter knows
about are not hte same positions from your source reader -- they are the
positions in the reader returned by textCanonizer.peformCanonization(reader)

you would probably be better implimenting your special logic as either a
TokenFIlter in which case you just modify the text but leave the position
info alone, or as Tokenizer that emits Tokens in which you have already
modified the text, but you record the orriginal positions.

(not understanding what exactly Canonization is makes it hard to know if
the TokenFilter approach will work for you, but if it does it's probably
the simplest/most reusable)


View raw message