Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80CC37139 for ; Tue, 9 Aug 2011 09:05:22 +0000 (UTC) Received: (qmail 58686 invoked by uid 500); 9 Aug 2011 09:05:14 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 58135 invoked by uid 500); 9 Aug 2011 09:04:55 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 58113 invoked by uid 99); 9 Aug 2011 09:04:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 09:04:49 +0000 X-ASF-Spam-Status: No, hits=-2000.8 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 09:04:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2487FB3DF3 for ; Tue, 9 Aug 2011 09:04:27 +0000 (UTC) Date: Tue, 9 Aug 2011 09:04:27 +0000 (UTC) From: "Christian Moen (JIRA)" To: dev@lucene.apache.org Message-ID: <510708449.19504.1312880667146.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <318605414.5210.1310455200127.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3305) Kuromoji code donation - a new Japanese morphological analyzer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3305?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D130= 81526#comment-13081526 ]=20 Christian Moen commented on LUCENE-3305: ---------------------------------------- Please see {{NOTICE.txt}} for information on the dictionaries. Kindly let me know which files that require a license header and how I shou= ld proceed to provide a revised version. Do you prefer a complete tarball = or can I attach the filed individually to this JIRA? Thanks! > Kuromoji code donation - a new Japanese morphological analyzer > -------------------------------------------------------------- > > Key: LUCENE-3305 > URL: https://issues.apache.org/jira/browse/LUCENE-3305 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis > Reporter: Christian Moen > Assignee: Simon Willnauer > Attachments: Kuromoji short overview .pdf, ip-clearance-Kuromoji.= xml, kuromoji-0.7.6-asf.tar.gz, kuromoji-0.7.6.tar.gz, kuromoji-solr-0.5.3-= asf.tar.gz, kuromoji-solr-0.5.3.tar.gz > > > Atilika Inc. (=E3=82=A2=E3=83=86=E3=82=A3=E3=83=AA=E3=82=AB=E6=A0=AA=E5= =BC=8F=E4=BC=9A=E7=A4=BE) would like to donate the Kuromoji Japanese morpho= logical analyzer to the Apache Software Foundation in the hope that it will= be useful to Lucene and Solr users in Japan and elsewhere. > The project was started in 2010 since we couldn't find any high-quality, = actively maintained and easy-to-use Java-based Japanese morphological analy= zers, and these become many of our design goals for Kuromoji. > Kuromoji also has a segmentation mode that is particularly useful for sea= rch, which we hope will interest Lucene and Solr users. Compound-nouns, su= ch as =E9=96=A2=E8=A5=BF=E5=9B=BD=E9=9A=9B=E7=A9=BA=E6=B8=AF (Kansai Intern= ational Airport) and =E6=97=A5=E6=9C=AC=E7=B5=8C=E6=B8=88=E6=96=B0=E8=81=9E= (Nikkei Newspaper), are segmented as one token with most analyzers. As a = result, a search for =E7=A9=BA=E6=B8=AF (airport) or =E6=96=B0=E8=81=9E (ne= wspaper) will not give you a for in these words. Kuromoji can segment thes= e words into =E9=96=A2=E8=A5=BF =E5=9B=BD=E9=9A=9B =E7=A9=BA=E6=B8=AF and = =E6=97=A5=E6=9C=AC =E7=B5=8C=E6=B8=88 =E6=96=B0=E8=81=9E, which is generall= y what you would want for search and you'll get a hit. > We also wanted to make sure the technology has a license that makes it co= mpatible with other Apache Software Foundation software to maximize its use= fulness. Kuromoji has an Apache License 2.0 and all code is currently owne= d by Atilika Inc. The software has been developed by my good friend and ex= -colleague Masaru Hasegawa and myself. > Kuromoji uses the so-called IPADIC for its dictionary/statistical model a= nd its license terms are described in NOTICE.txt. > I'll upload code distributions and their corresponding hashes and I'd ver= y much like to start the code grant process. I'm also happy to provide pat= ches to integrate Kuromoji into the codebase, if you prefer that. > Please advise on how you'd like me to proceed with this. Thank you. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org