Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0F717B53F for ; Thu, 12 Jan 2012 20:47:06 +0000 (UTC) Received: (qmail 66751 invoked by uid 500); 12 Jan 2012 20:47:04 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 66236 invoked by uid 500); 12 Jan 2012 20:47:03 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 66150 invoked by uid 99); 12 Jan 2012 20:47:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jan 2012 20:47:03 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jan 2012 20:47:00 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6B4BB147622 for ; Thu, 12 Jan 2012 20:46:39 +0000 (UTC) Date: Thu, 12 Jan 2012 20:46:39 +0000 (UTC) From: "Robert Muir (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <1297512358.35901.1326401199440.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <318605414.5210.1310455200127.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3305) Kuromoji code donation - a new Japanese morphological analyzer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3305?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D131= 85222#comment-13185222 ]=20 Robert Muir commented on LUCENE-3305: ------------------------------------- Yes, thanks also to Uwe for lots of work compressing data and refactoring, = and Mike for tuning the fsts. =20 > Kuromoji code donation - a new Japanese morphological analyzer > -------------------------------------------------------------- > > Key: LUCENE-3305 > URL: https://issues.apache.org/jira/browse/LUCENE-3305 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis > Reporter: Christian Moen > Assignee: Robert Muir > Fix For: 4.0 > > Attachments: Kuromoji short overview .pdf, LUCENE-3305.patch, LUC= ENE-3305.patch, ip-clearance-Kuromoji.xml, ip-clearance-Kuromoji.xml, kurom= oji-0.7.6-asf.tar.gz, kuromoji-0.7.6.tar.gz, kuromoji-solr-0.5.3-asf.tar.gz= , kuromoji-solr-0.5.3.tar.gz, wordid0.patch > > > Atilika Inc. (=E3=82=A2=E3=83=86=E3=82=A3=E3=83=AA=E3=82=AB=E6=A0=AA=E5= =BC=8F=E4=BC=9A=E7=A4=BE) would like to donate the Kuromoji Japanese morpho= logical analyzer to the Apache Software Foundation in the hope that it will= be useful to Lucene and Solr users in Japan and elsewhere. > The project was started in 2010 since we couldn't find any high-quality, = actively maintained and easy-to-use Java-based Japanese morphological analy= zers, and these become many of our design goals for Kuromoji. > Kuromoji also has a segmentation mode that is particularly useful for sea= rch, which we hope will interest Lucene and Solr users. Compound-nouns, su= ch as =E9=96=A2=E8=A5=BF=E5=9B=BD=E9=9A=9B=E7=A9=BA=E6=B8=AF (Kansai Intern= ational Airport) and =E6=97=A5=E6=9C=AC=E7=B5=8C=E6=B8=88=E6=96=B0=E8=81=9E= (Nikkei Newspaper), are segmented as one token with most analyzers. As a = result, a search for =E7=A9=BA=E6=B8=AF (airport) or =E6=96=B0=E8=81=9E (ne= wspaper) will not give you a for in these words. Kuromoji can segment thes= e words into =E9=96=A2=E8=A5=BF =E5=9B=BD=E9=9A=9B =E7=A9=BA=E6=B8=AF and = =E6=97=A5=E6=9C=AC =E7=B5=8C=E6=B8=88 =E6=96=B0=E8=81=9E, which is generall= y what you would want for search and you'll get a hit. > We also wanted to make sure the technology has a license that makes it co= mpatible with other Apache Software Foundation software to maximize its use= fulness. Kuromoji has an Apache License 2.0 and all code is currently owne= d by Atilika Inc. The software has been developed by my good friend and ex= -colleague Masaru Hasegawa and myself. > Kuromoji uses the so-called IPADIC for its dictionary/statistical model a= nd its license terms are described in NOTICE.txt. > I'll upload code distributions and their corresponding hashes and I'd ver= y much like to start the code grant process. I'm also happy to provide pat= ches to integrate Kuromoji into the codebase, if you prefer that. > Please advise on how you'd like me to proceed with this. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org