lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Use case for the Shingle Filter
Date Sun, 05 Mar 2017 10:47:55 GMT
Hello - we use it for text classification and online near-duplicate document detection/filtering.
Using shingles means you want to consider order in the text. It is analogous to using bigrams
and trigrams when doing language detection, you cannot distinguish between Danish and Norwegian
solely on single characters.

Markus

 
 
-----Original message-----
> From:Ryan Yacyshyn <ryan.yacyshyn@gmail.com>
> Sent: Sunday 5th March 2017 5:57
> To: solr-user@lucene.apache.org
> Subject: Use case for the Shingle Filter
> 
> Hi everyone,
> 
> I was thinking of using the Shingle Filter to help solve an issue I'm
> facing. I can see this working in the analysis panel in the Solr admin, but
> not when I make my queries.
> 
> I find out it's because of the query parser splitting up the tokens on
> white space before passing them along.
> 
> This made me wonder what a practical use case can be, for using the shingle
> filter?
> 
> Any enlightenment on this would be much appreciated!
> 
> Thanks,
> Ryan
> 

Mime
View raw message