Hi,
We are trying to index html files which have japanese / korean / chinese
content using the CJK analyser. But while indexing we are getting Lexical
parse error. Encountered unkown character. We tried setting the string
encoding to UTF 8 but it does not help.
Can anyone please help. Any pointers will be highly appreciated.
Thanks
--
View this message in context: http://www.nabble.com/Chinese-Japanese-Korean-Indexing-issue-Version-2.4-tp25388003p25388003.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|