lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: lucene gosen diff btn jars
Date Sat, 03 Mar 2012 00:46:29 GMT
Hi Thushara,

Please use lucene-gosen mailing list for lucene-gosen questions:

http://groups.google.com/group/lucene-gosen

Thanks,

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

(12/03/03 6:41), Thushara Wijeratna wrote:
> I'm testing lucene-gosen for Japanese tokenization and wondering what the
> differences are between the two jars provided. (ipadic / chaisen)?
> In my preliminary testing, I'm not seeing any difference in tokenization in
> these two jars.  (the jar with no dictionary did not work, I assume I need
> to make available a custom dictionary - header.sen which I did not try)
> 
> I tried to tokenize this phrase:
> 
> ゴルフが大好きなあなた。
> アメリカにあるベスト・ゴルフコース情報が満載のイエローページ・ジャパンでは、オンラインまたはガイド・ブックからもあらゆる情報が簡単に入手できます。
> 詳しい情報は
> 
> 
> which google translates as
> 
> 
> You love golf. Best golf course information in the United States is in the
> Yellow Pages Japan is full of, any information can be obtained easily from
> online or book guide. For more information
> 
> 
> I'm getting identical tokenization from both jars, namely :
> 
> 
> ゴルフ / Golf
> 
>   大好き / I love
> 
>   あなた / You
> 
>   アメリカ / America
> 
>   ベスト / best
> 
>   ゴルフコース / Golf course
> 
>   情報 / information
> 
>   満載 / save
> 
>   イエロ / Hierro
> 
>   ページ / page
> 
>   ジャパン / Japan
> 
>   オンライン / online
> 
>   ガイド / guide
> 
>   ブック / book
> 
>   あらゆる / all
> 
>   情報 / information
> 
>   簡単 / simple
> 
>   入手 / obtaining
> 
>   できる / able to
> 
>   詳しい  /detailed
> 
>   情報 / information
> 
> 
> Note: translations based on Google Translate
> 
> 
> Any pointers you can provide as to the difference of the two methods of
> tokenizing would be highly appreciated.
> 
> 
> thx,
> 
> thushara
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message