lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3767) Explore streaming Viterbi search in Kuromoji
Date Thu, 09 Feb 2012 23:40:04 GMT
Explore streaming Viterbi search in Kuromoji
--------------------------------------------

                 Key: LUCENE-3767
                 URL: https://issues.apache.org/jira/browse/LUCENE-3767
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 3.6, 4.0


I've been playing with the idea of changing the Kuromoji viterbi
search to be 2 passes (intersect, backtrace) instead of 4 passes
(break into sentences, intersect, score, backtrace)... this is very
much a work in progress, so I'm just getting my current state up.
It's got tons of nocommits, doesn't properly handle the user dict nor
extended modes yet, etc.

One thing I'm playing with is to add a double backtrace for the long
compound tokens, ie, instead of penalizing these tokens so that
shorter tokens are picked, leave the scores unchanged but on backtrace
take that penalty and use it as a threshold for a 2nd best
segmentation...


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message