lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From András Péteri <apet...@b2international.com>
Subject Re: Lucene same search result for worlds with and without spaces
Date Wed, 20 Jun 2018 09:58:59 GMT
An n-gram tokenizer/filter might also work for you:
http://lucene.apache.org/core/7_3_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html

Regards,
András

On Wed, Jun 20, 2018 at 11:53 AM, Markus Jelsma <markus.jelsma@openindex.io>
wrote:

> Hi Egorlex,
>
> Set the tokenSeparator to "" and ShingleFilter will concatenate all
> shingles without whitespace. Keep in mind, this will greatly increase the
> size of the index so it might not be a good idea to concatenate all pairs
> of words.
>
> If you are looking for finding "similarissues" with "similar issues" (and
> vice versa) you might want to check out DictionaryCompoundWordTokenFilter
> and/or HyphenationCompoundWordTokenFilter. Although English hardly uses
> compound words, the token filters still do their job quite nicely.
>
> Regards,
> Markus
>
>
>
> -----Original message-----
> > From:egorlex <egorlex@gmail.com>
> > Sent: Wednesday 20th June 2018 11:42
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene same search result for worlds with and without spaces
> >
> > Thanks for replay!
> >
> > sorry, could you help a little, according to example
> >
> > "given the phrase “Shingles is a viral disease”, a shingle filter might
> > produce:
> >
> > Shingles is
> > is a
> > a viral
> > viral disease
> > "
> >
> > I do not quite understand how this ShingleFilter can turn "similarissues"
> > into "similar issues"
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> f532864.html
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message