lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Lucene and Chinese language
Date Thu, 01 Jul 2010 12:47:57 GMT
its really just a bad situation for chinese :(

with queryparser, you either get no phrase query support (by using the
hack), or all queries automatically become phrase queries.

if you want to do the equivalent of *在电力虎*, then you need to use queryparser
without the hack, just do a query of "在电力虎" (with quotes).
if you want to do the equivalent of 在 电 力 虎, then you need to use
queryparser with the hack, just do a query of 在电力虎 (without quotes).


2010/7/1 Kolhoff, Jacqueline - ENCOWAY <Kolhoff@encoway.de>

> Ok, understand!
>
> So it is better to use another analyzer in the chinese case at index-time
> or do you suggest to use another "QueryParser" at query-time?
>
> -----Ursprüngliche Nachricht-----
> Von: Robert Muir [mailto:rcmuir@gmail.com]
> Gesendet: Donnerstag, 1. Juli 2010 14:35
> An: java-user@lucene.apache.org
> Betreff: Re: Lucene and Chinese language
>
> 2010/7/1 Kolhoff, Jacqueline - ENCOWAY <Kolhoff@encoway.de>
>
> > As you can see, the query parser automatically added double quotes and
> > blanks. But this does not work for our English or German queries.
> >
> > If I use the PositionHackAnalyzerWrapper and the case with * I got no
> > results, query is:
> > +anotherfieldname:description +myfieldname:*在电力虎*
> >
> > If I remove the * the query is:
> > + anotherfieldname: description
> > +(myfieldname:在myfieldname:电myfieldname:力myfieldname:虎)
> >
> > and I got results but not for German or English queries.
> >
>
> Weird?
> >
> >
> its working correctly, your chinese wildcard query doesnt make sense, as
> you
> havent indexed the text in a way to do queries like that (you have indexed
> individual chars).
> in practice this is where you would do a chinese phrase query of "在电力虎"
> (with quotes) instead of *... but if you use the positionfilterhack, you
> cant do phrase queries.
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message