lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Multiple Language Indexing and Searching
Date Tue, 06 Sep 2005 12:51:02 GMT

On Sep 6, 2005, at 7:15 AM, Hacking Bear wrote:

> On 9/6/05, Olivier Jaquemet <olivier.jaquemet@jalios.com> wrote:
>
>>
>> As far as your usage is concerned, it seems to be the right approach,
>> and I think the StandardAnalyzer does the job pretty right when it  
>> has
>> to deal with whatever language you want.
>>
>
>  I should look into exactly what it does. Does this  
> StandardAnalyzer handle
> non-European languages like Chinese?

StandardAnalyzer recognizes the CJK range of characters and emits  
them each as individual tokens.  This is not an ideal way to deal  
with Chinese (不好 :) - but it does at least maintain the characters  
rather than throwing them away or blurring consecutive characters  
together.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message