lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kafka0102 <kafka0...@163.com>
Subject Re: Lucene and Chinese language
Date Fri, 02 Jul 2010 07:09:15 GMT
you can choose ik : http://code.google.com/p/ik-analyzer/ for this 
problem. It generates a better token result than standard and CJK. And, 
you can use its IKQueryParser instead of queryparser. It generates "and 
queries" instead of " phrase queries".



On 2010年07月01日 20:47, Robert Muir wrote:
> its really just a bad situation for chinese :(
>
> with queryparser, you either get no phrase query support (by using the
> hack), or all queries automatically become phrase queries.
>
> if you want to do the equivalent of *在电力虎*, then you need to use queryparser
> without the hack, just do a query of "在电力虎" (with quotes).
> if you want to do the equivalent of 在 电 力 虎, then you need to use
> queryparser with the hack, just do a query of 在电力虎 (without quotes).
>
>
> 2010/7/1 Kolhoff, Jacqueline - ENCOWAY<Kolhoff@encoway.de>
>
>    
>> Ok, understand!
>>
>> So it is better to use another analyzer in the chinese case at index-time
>> or do you suggest to use another "QueryParser" at query-time?
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Robert Muir [mailto:rcmuir@gmail.com]
>> Gesendet: Donnerstag, 1. Juli 2010 14:35
>> An: java-user@lucene.apache.org
>> Betreff: Re: Lucene and Chinese language
>>
>> 2010/7/1 Kolhoff, Jacqueline - ENCOWAY<Kolhoff@encoway.de>
>>
>>      
>>> As you can see, the query parser automatically added double quotes and
>>> blanks. But this does not work for our English or German queries.
>>>
>>> If I use the PositionHackAnalyzerWrapper and the case with * I got no
>>> results, query is:
>>> +anotherfieldname:description +myfieldname:*在电力虎*
>>>
>>> If I remove the * the query is:
>>> + anotherfieldname: description
>>> +(myfieldname:在myfieldname:电myfieldname:力myfieldname:虎)
>>>
>>> and I got results but not for German or English queries.
>>>
>>>        
>> Weird?
>>      
>>>
>>>        
>> its working correctly, your chinese wildcard query doesnt make sense, as
>> you
>> havent indexed the text in a way to do queries like that (you have indexed
>> individual chars).
>> in practice this is where you would do a chinese phrase query of "在电力虎"
>> (with quotes) instead of *... but if you use the positionfilterhack, you
>> cant do phrase queries.
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>>      
>
>
>    



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message