lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danil ŢORIN <torin...@gmail.com>
Subject Re: Lucene and Chinese language
Date Thu, 01 Jul 2010 09:30:21 GMT
Try to use CJK analyzer for both indexing and searching chinese language.
Then you won't need "text"->"*text*" transformation.

There might be some false positives in the results though.
You can also may want to try smartcn analyzer which is dictionary based, but
I have no expertise to evaluate the results (we still use CJK for asian
languages, as there are no complains so far)


2010/7/1 Kolhoff, Jacqueline - ENCOWAY <Kolhoff@encoway.de>

>
> Hi!
>
> We are using lucene in our project to search through information objects
> which works fine. For indexing we use the StandardAnalyzer.
> Now, we have to support the Chinese language. I found out that the Chinese
> words and letters are correctly saved in the index but the query to search
> for them does not work. Example: in English language the query is “text”
> which we parse to “*text*”. If we search for Chinese words / phrases like
> “佛山东方书城”the query is “*佛山东方书城*“ but there are no search
results. If the
> query places blanks between the single letters / symbols like this “*佛 山 东 方
> 书 城*“ we are getting results. Does the StandardAnalyzer interpret each
> Chinese letter as one word? What are best practices for this case? Shall we
> use another analyzer (Chinese analyzer)? Or is it better to replace the
> query parser in this case?
>
> Regards,
> Jacqueline.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message