lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
Date Thu, 27 Feb 2014 21:43:21 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915083#comment-13915083
] 

Michael McCandless commented on LUCENE-5468:
--------------------------------------------

These are incredible reductions on RAM usage from cutting over to
FSTs.  And it's nice that you are using IntSequenceOutputs, and
that you are now able to load dictionaries that failed before!

I'm not sure it matters here, but do you handle the FST Builder returning
null for the built FST (when there was nothing added)?  Just a common
gotchya...

Do you have any sense of how the lookup speed changed?


> Hunspell very high memory use when loading dictionary
> -----------------------------------------------------
>
>                 Key: LUCENE-5468
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5468
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Maciej Lisiewski
>            Priority: Minor
>         Attachments: LUCENE-5468.patch, patch.txt
>
>
> Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules
files. 
> For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core
to crash with various out of memory errors unless you set max heap size close to 2GB or more.
> By comparison Stempel using the same dictionary file works just fine with 1/8 of that
(and possibly lower values as well).
> Sample error log entries:
> http://pastebin.com/fSrdd5W1
> http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message