lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7429) DelegatingAnalyzerWrapper should delegate normalization too
Date Thu, 01 Sep 2016 10:19:21 GMT


Adrien Grand commented on LUCENE-7429:

bq. The issue here is mostly that we need to create a new TokenStream (StringTokenStream)
for the normalization and we need to use the same attribute types.

Exactly. For instance if a term attribute produces utf-16 encoded tokens, 

bq. Although this is sometimes broken for use-cases, where TokenStreams create binary tokens.
But those would never be normalized, I think (!?)

Do you mean that you cannot think of any use-case for using both a non-default term attribute
and token filters in the same analysis chain? I am wondering about CJK analyzers since I think
UTF16 stores CJK characters a bit more efficiently on average than UTF8 (I may be completely
wrong, in which case please let me know) so users might be tempted to use a different term
attribute impl?

> DelegatingAnalyzerWrapper should delegate normalization too
> -----------------------------------------------------------
>                 Key: LUCENE-7429
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 6.2
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7355.patch, LUCENE-7429.patch, LUCENE-7429.patch
> This is something that I overlooked in LUCENE-7355: (Delegating)AnalyzerWrapper uses
the default implementation of initReaderForNormalization and normalize, meaning that by default
the normalization is a no-op. It should delegate to the wrapped analyzer.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message