lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: PositionLengthAttribute
Date Sat, 07 Sep 2013 12:39:31 GMT
On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies <> wrote:
> In Japanese, compounds are just decompositions of the input string. In
> other languages, compounds can manufacture entire tokens from thin
> air. In those cases, it's something of a question how to decide on the
> offsets. I think that you're right, eventually, insofar as there's
> some offset in the original that might as well be blamed for any given
> component.

Why change the offsets then? Offsets are for highlighting. Let the
whole compound be highlighted when its a match in search results. Its
transparent and totally accurate as to what is happening: this is why
we do highlighting, to aid the user can make a relevance assessment
about the document, not to try to assist the end user to debug the
analysis chain or anything like that.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message