lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Lucene same search result for worlds with and without spaces
Date Wed, 20 Jun 2018 09:53:58 GMT
Hi Egorlex, 

Set the tokenSeparator to "" and ShingleFilter will concatenate all shingles without whitespace.
Keep in mind, this will greatly increase the size of the index so it might not be a good idea
to concatenate all pairs of words.

If you are looking for finding "similarissues" with "similar issues" (and vice versa) you
might want to check out DictionaryCompoundWordTokenFilter and/or HyphenationCompoundWordTokenFilter.
Although English hardly uses compound words, the token filters still do their job quite nicely.

Regards,
Markus

 
 
-----Original message-----
> From:egorlex <egorlex@gmail.com>
> Sent: Wednesday 20th June 2018 11:42
> To: java-user@lucene.apache.org
> Subject: Re: Lucene same search result for worlds with and without spaces
> 
> Thanks for replay!
> 
> sorry, could you help a little, according to example
> 
> "given the phrase “Shingles is a viral disease”, a shingle filter might
> produce:
> 
> Shingles is
> is a
> a viral
> viral disease
> "
> 
> I do not quite understand how this ShingleFilter can turn "similarissues"
> into "similar issues" 
> 
> Thanks!
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message