hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satoshi Iijima (JIRA)" <>
Subject [jira] [Created] (HIVEMALL-208) tokenize_ja failed to analyze certain Japanese strings
Date Wed, 27 Jun 2018 07:03:00 GMT
Satoshi Iijima created HIVEMALL-208:

             Summary: tokenize_ja failed to analyze certain Japanese strings
                 Key: HIVEMALL-208
             Project: Hivemall
          Issue Type: Bug
    Affects Versions: 0.5.0
            Reporter: Satoshi Iijima

tokenize_ja failed to analyze certain Japanese strings and outputed below error.
java.lang.ArrayIndexOutOfBoundsException: -1
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.backtrace(
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.parse(
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(
 at org.apache.lucene.analysis.ja.JapaneseBaseFormFilter.incrementToken(
 at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(
 at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(
 at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(
 at org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(
 at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(
 at hivemall.nlp.tokenizer.KuromojiUDF.analyzeTokens(
 at hivemall.nlp.tokenizer.KuromojiUDF.evaluate(
This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded.
 Affected versions are not only v0.5.0 but also v0.4.2.


This message was sent by Atlassian JIRA

View raw message