lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3699) kuromoji dictionary could be more compact
Date Mon, 16 Jan 2012 17:03:39 GMT
kuromoji dictionary could be more compact
-----------------------------------------

                 Key: LUCENE-3699
                 URL: https://issues.apache.org/jira/browse/LUCENE-3699
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Robert Muir
         Attachments: LUCENE-3699.patch

Reading thru the ipadic documentation, i realized we are storing a lot of redundant information,
for example the connection costs for bigram weights are based on POS+inflection data, so its
redundant 
to also separately encode POS and inflection data for each entry.

With the patch the dictionary access is also faster and simpler, and TokenInfoDictionary is
1.5MB smaller.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message