lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paco Avila <pav...@openkm.com>
Subject Re: Question about chinese and WildcardQuery
Date Thu, 28 Jun 2012 07:23:40 GMT
Thanks for the info.

2012/6/28 Li Li <fancyerii@gmail.com>

> in Chinese, there isn't word boundary between words. it writes like:
> Iamok. you should tokenize it to I am ok
> if you want to search *amo*, you should view I am ok as one token. In
> Chinese, fuzzy search is not very useful. even use Standard Analyzer,
> it's ok to use boolean query. because "Iamok" is tokenized as I a m o
> k. if search boolean query +a +m +o, it's fine. Chinese has many
> letters(commonly used more than 3000). and words are very short(most
> words has only 2 letters).
>
>
> On Thu, Jun 28, 2012 at 2:31 PM, Paco Avila <monkiki@gmail.com> wrote:
> > Thank, using Whitespace Analyzer works, but I don't understand why
> > StandardAnalyzer does not work if according with the ChineseAnalyzer
> > deprecation I should use StandardAnalyzer:
> >
> > @deprecated Use {@link StandardAnalyzer} instead, which has the same
> > functionality.
> >
> > Is very annoying.
> >
> > 2012/6/27 Li Li <fancyerii@gmail.com>
> >
> >> standard analyzer will segment each character into a token, you should
> use
> >> whitespace analyzer or your own analyzer that can tokenize it as one
> token
> >> for wildcard search
> >> 在 2012-6-27 傍晚6:20,"Paco Avila" <monkiki@gmail.com>写道:
> >>
> >> > Hi there,
> >> >
> >> > I have to index chinese content and I don't get the expected results
> when
> >> > searching. It seems that the WildcardQuery does not work properly with
> >> the
> >> > chinese characters. See attached sample code.
> >> >
> >> > I store the string "专项信息管理.doc" using the StandardAnalyzer
and after
> that
> >> > search for "专项信*" and no result is given. AFAIK, it should match
the
> >> > "专项信息管理.doc" string but it doesn't :(
> >> >
> >> > NOTE: Use Lucene 3.1.0
> >> >
> >> > Regards.
> >> > --
> >> > http://www.openkm.com
> >> > http://www.guia-ubuntu.org
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >>
> >
> >
> >
> > --
> > OpenKM
> > http://www.openkm.com
> > http://www.guia-ubuntu.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
OpenKM
http://www.openkm.com
http://facebook.com/OpenKM.DMS

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message