lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: surrogate pairs
Date Fri, 12 Mar 2010 06:44:54 GMT
Hi Yuta,
Are you looking for a specific analyzer like CJKANalyzer or do you
look for tokenstreams like lowercaseTokenFilter etc.
A fair bit of the token filters are already converted to support
handle surrogate pairs correctly. If you need help to figure out how
to use stuff from trunk I'm happy to help.

simon

On Fri, Mar 12, 2010 at 5:27 AM, Yuta Kawadai <yutax77@gmail.com> wrote:
> Thank you.
>
> Now I use own Analyzer which based on "MeCab" (It's open source
> Japanese morphological analyzer).
> I try to modify it to support surrogate pairs.
>
> And I'm expecting the next release!
>
> Yuta
>
> 2010/3/11 Robert Muir <rcmuir@gmail.com>:
>> On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai <yutax77@gmail.com> wrote:
>>> Hi
>>>
>>> Can Lucene use surrogate pairs (and its term positions or length) ?
>>>
>>> Thanks,
>>> Yuta
>>
>> Yes, just make sure you use an Analyzer that supports them...
>> unfortunately most of the ones included with released versions of
>> Lucene (e.g. CJKAnalyzer) will not do the right thing, hopefully in
>> the next release they will.
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message