lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kolhoff, Jacqueline - ENCOWAY" <>
Subject Lucene and Chinese language
Date Thu, 01 Jul 2010 09:19:29 GMT


We are using lucene in our project to search through information objects which works fine.
For indexing we use the StandardAnalyzer.
Now, we have to support the Chinese language. I found out that the Chinese words and letters
are correctly saved in the index but the query to search for them does not work. Example:
in English language the query is “text” which we parse to “*text*”. If we search for
Chinese words / phrases like “佛山东方书城”the query is “*佛山东方书城*“
but there are no search results. If the query places blanks between the single letters / symbols
like this “*佛 山 东 方 书 城*“ we are getting results. Does the StandardAnalyzer
interpret each Chinese letter as one word? What are best practices for this case? Shall we
use another analyzer (Chinese analyzer)? Or is it better to replace the query parser in this

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message