lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7429) DelegatingAnalyzerWrapper should delegate normalization too
Date Thu, 01 Sep 2016 10:19:21 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455008#comment-15455008
] 

Adrien Grand commented on LUCENE-7429:
--------------------------------------

bq. The issue here is mostly that we need to create a new TokenStream (StringTokenStream)
for the normalization and we need to use the same attribute types.

Exactly. For instance if a term attribute produces utf-16 encoded tokens, 

bq. Although this is sometimes broken for use-cases, where TokenStreams create binary tokens.
But those would never be normalized, I think (!?)

Do you mean that you cannot think of any use-case for using both a non-default term attribute
and token filters in the same analysis chain? I am wondering about CJK analyzers since I think
UTF16 stores CJK characters a bit more efficiently on average than UTF8 (I may be completely
wrong, in which case please let me know) so users might be tempted to use a different term
attribute impl?

> DelegatingAnalyzerWrapper should delegate normalization too
> -----------------------------------------------------------
>
>                 Key: LUCENE-7429
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7429
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 6.2
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7355.patch, LUCENE-7429.patch, LUCENE-7429.patch
>
>
> This is something that I overlooked in LUCENE-7355: (Delegating)AnalyzerWrapper uses
the default implementation of initReaderForNormalization and normalize, meaning that by default
the normalization is a no-op. It should delegate to the wrapped analyzer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message