lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Josal <r...@josal.com>
Subject Re: Use case for the Shingle Filter
Date Sun, 05 Mar 2017 15:58:40 GMT
I thought new versions of solr didn't split on whitespace at the query
parser anymore, so this should work?

That being said, I think I remember it having a problem coming after a
synonym filter.  IIRC, if your input is "Foo Bar" and you have a synonym
"foo <=> baz" you would get foobaz bazbar instead of foobar and bazbar.  I
wrote a custom shingler to account for that.

Ryan

On Sun, Mar 5, 2017 at 02:48 Markus Jelsma <markus.jelsma@openindex.io>
wrote:

> Hello - we use it for text classification and online near-duplicate
> document detection/filtering. Using shingles means you want to consider
> order in the text. It is analogous to using bigrams and trigrams when doing
> language detection, you cannot distinguish between Danish and Norwegian
> solely on single characters.
>
> Markus
>
>
>
> -----Original message-----
> > From:Ryan Yacyshyn <ryan.yacyshyn@gmail.com>
> > Sent: Sunday 5th March 2017 5:57
> > To: solr-user@lucene.apache.org
> > Subject: Use case for the Shingle Filter
> >
> > Hi everyone,
> >
> > I was thinking of using the Shingle Filter to help solve an issue I'm
> > facing. I can see this working in the analysis panel in the Solr admin,
> but
> > not when I make my queries.
> >
> > I find out it's because of the query parser splitting up the tokens on
> > white space before passing them along.
> >
> > This made me wonder what a practical use case can be, for using the
> shingle
> > filter?
> >
> > Any enlightenment on this would be much appreciated!
> >
> > Thanks,
> > Ryan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message