lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noopur Julka <noopur.ju...@gmail.com>
Subject Re: Efficient string lookup using Lucene
Date Sat, 25 Aug 2012 12:11:13 GMT
Hi,

I have a similar issue.
I need lucene search to work with kanji characters (japanese).

The hits object (or topDocs) returns length = 0 for results but works well
for english.
I know my index contains matches as luke (lucene search tool) renders them.

I tried lace analyser - did not work.

Regards,
Noopur Julka



On Sat, Aug 25, 2012 at 2:28 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:

>  [image: Boxbe] <https://www.boxbe.com/overview>
> java-user@lucene.apache.org is not on your Guest List<https://www.boxbe.com/approved-list>|
Approve
> sender <https://www.boxbe.com/anno?tc=12214130363_2118064944> | Approve
> domain <https://www.boxbe.com/anno?tc=12214130363_2118064944&dom>
>
> > search for a string "run", I do not need to find "ran" but I
> > do want to find it in all of these strings below:
> >
> > Fox is running fast
> > !%#^&$run!$!%@&$#
> > run,run
>
>
> With NGramFilter you can do that. But it creates a lot of tokens. For
> example "Fox is running fast" becomes
>
> F
>
> o
>
> x
>
> Fo
>
> ox
>
> Fox
>
> i
>
> s
>
> is
>
> r
>
> u
>
> n
>
> n
>
> i
>
> n
>
> g
>
> ru
>
> un
>
> nn
>
> ni
>
> in
>
> ng
>
> *run*
>
> unn
>
> nni
>
> nin
>
> ing
>
> runn
>
> unni
>
> nnin
>
> ning
>
> runni
>
> unnin
>
> nning
>
> runnin
>
> unning
>
> running
>
> f
>
> a
>
> s
>
> t
>
> fa
>
> as
>
> st
>
> fas
>
> ast
>
> fast
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message