lucene-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Sokolov (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-9064) Can we remove the FST cache in Kuromoji and Nori analyzers?
Date Mon, 25 Nov 2019 17:44:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981739#comment-16981739
] 

Michael Sokolov edited comment on LUCENE-9064 at 11/25/19 5:43 PM:
-------------------------------------------------------------------

[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia} (commented out). To get
it running you must download jawiki from wikipedia and edit the test to point at the file
you downloaded. You might also have to disable secutiry manager checks that prevent reading
from random places in the filesystem.


was (Author: sokolov):
[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia}. To get it running you must
download jawiki from wikipedia and edit the test to point at the file you downloaded. You
might also have to disable secutiry manager checks that prevent reading from random places
in the filesystem.

> Can we remove the FST cache in Kuromoji and Nori analyzers?
> -----------------------------------------------------------
>
>                 Key: LUCENE-9064
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9064
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Priority: Minor
>
> Is the ~30k han cache in kuromoji redundant after LUCENE-8920?
> [https://github.com/apache/lucene-solr/blob/813ca77250db29116812bc949e2a466a70f969a3/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoFST.java#L35-L38])
> The entire linked file's purpose is all around this caching, so if its not needed anymore
it would be a nice cleanup. But it was definitely needed for good performance before, so we
shoudl be careful. Nori analyzer has the exact same thing (file has the same name) for ~10k
hangul syllables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Mime
View raw message