hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satoshi Iijima (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVEMALL-208) tokenize_ja failed to analyze certain Japanese strings
Date Wed, 27 Jun 2018 07:03:00 GMT
Satoshi Iijima created HIVEMALL-208:
---------------------------------------

             Summary: tokenize_ja failed to analyze certain Japanese strings
                 Key: HIVEMALL-208
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-208
             Project: Hivemall
          Issue Type: Bug
    Affects Versions: 0.5.0
            Reporter: Satoshi Iijima


tokenize_ja failed to analyze certain Japanese strings and outputed below error.
{panel}
java.lang.ArrayIndexOutOfBoundsException: -1
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.backtrace(JapaneseTokenizer.java:1024)
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.parse(JapaneseTokenizer.java:873)
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:474)
 at org.apache.lucene.analysis.ja.JapaneseBaseFormFilter.incrementToken(JapaneseBaseFormFilter.java:50)
 at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
 at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
 at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
 at org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:63)
 at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
 at hivemall.nlp.tokenizer.KuromojiUDF.analyzeTokens(KuromojiUDF.java:292)
 at hivemall.nlp.tokenizer.KuromojiUDF.evaluate(KuromojiUDF.java:117)
{panel}
This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded.
 Affected versions are not only v0.5.0 but also v0.4.2.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message