lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven White <swhite4...@gmail.com>
Subject Re: analyzer, indexAnalyzer and queryAnalyzer
Date Mon, 04 May 2015 12:29:17 GMT
Thanks Doug.  This is extremely helpful.  It is much appreciated that you
took the time to write it all.

Do we have a Solr / Lucene wiki with such "did you know?" write ups?  If
not, just having this kind of knowledge in an email isn't good enough as it
won't be as searchable as a wiki.

Steve

On Wed, Apr 29, 2015 at 9:24 PM, Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> So Solr has the idea of a query parser. The query parser is a convenient
> way of passing a search string to Solr and having Solr parse it into
> underlying Lucene queries: You can see a list of query parsers here
> http://wiki.apache.org/solr/QueryParser
>
> What this means is that the query parser does work to pull terms into
> individual clauses *before* analysis is run. It's a parsing layer that sits
> outside the analysis chain. This creates problems like the "sea biscuit"
> problem, whereby we declare "sea biscuit" as a query time synonym of
> "seabiscuit". As you may know synonyms are checked during analysis.
> However, if the query parser splits up "sea" from "biscuit" before running
> analysis, the query time analyzer will fail. The string "sea" is brought by
> itself to the query time analyzer and of course won't match "sea biscuit".
> Same with the string "biscuit" in isolation. If the full string "sea
> biscuit" was brought to the analyzer, it would see [sea] next to [biscuit]
> and declare it a synonym of seabiscuit. Thanks to the query parser, the
> analyzer has lost the association between the terms, and both terms aren't
> brought together to the analyzer.
>
> My colleague John Berryman wrote a pretty good blog post on this
>
> http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/
>
> There's several solutions out there that attempt to address this problem.
> One from Ted Sullivan at Lucidworks
>
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> Another popular one is the hon-lucene-synonyms plugin:
>
> http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html
>
> Yet another work-around is to use the field query parser:
>
> http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html
>
> I also tend to write my own query parsers, so on the one hand its annoying
> that query parsers have the problems above, on the flipside Solr makes it
> very easy to implement whatever parsing you think is appropriatte with a
> small bit of Java/Lucene knowledge.
>
> Hopefully that explanation wasn't too deep, but its an important thing to
> know about Solr. Are you asking out of curiosity, or do you have a specific
> problem?
>
> Thanks
> -Doug
>
> On Wed, Apr 29, 2015 at 6:32 PM, Steven White <swhite4141@gmail.com>
> wrote:
>
> > Hi Doug,
> >
> > I don't understand what you mean by the following:
> >
> > > For example, if a user searches for q=hot dogs&defType=edismax&qf=title
> > > body the *query parser* *not* the *analyzer* first turns the query
> into:
> >
> > If I have indexAnalyzer and queryAnalyzer in a fieldType that are 100%
> > identical, the example you provided, does it stand?  If so, why?  Or do
> you
> > mean something totally different by "query parser"?
> >
> > Thanks
> >
> > Steve
> >
> >
> > On Wed, Apr 29, 2015 at 4:18 PM, Doug Turnbull <
> > dturnbull@opensourceconnections.com> wrote:
> >
> > > *> 1) If the content of indexAnalyzer and queryAnalyzer are exactly the
> > > same,that's the same as if I have an analyzer only, right?*
> > > 1) Yes
> > >
> > > *>  2) Under the hood, all three are the same thing when it comes to
> what
> > > kind*
> > > *of data and configuration attributes can take, right?*
> > > 2) Yes. Both take in text and output a token stream.
> > >
> > > *>What I'm trying to figure out is this: beside being able to configure
> > a*
> > >
> > > *fieldType to have different analyzer setting at index and query time,
> > > thereis nothing else that's unique about each.*
> > >
> > > The only thing to look out for in Solr land is the query parser. Most
> > Solr
> > > query parsers treat whitespace as meaningful.
> > >
> > > For example, if a user searches for q=hot dogs&defType=edismax&qf=title
> > > body the *query parser* *not* the *analyzer* first turns the query
> into:
> > >
> > > (title:hot title:dog) | (body:hot body:dog)
> > >
> > > each word which *then *gets analyzed. This is because the query parser
> > > tries to be smart and turn "hot dog" into hot OR dog, or more
> > specifically
> > > making them two must clauses.
> > >
> > > This trips quite a few folks up, you can use the field query parser
> which
> > > uses the field as a phrase query. Hope that helps
> > >
> > >
> > > --
> > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> Connections,
> > > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > > Author: Taming Search <http://manning.com/turnbull> from Manning
> > > Publications
> > > This e-mail and all contents, including attachments, is considered to
> be
> > > Company Confidential unless explicitly stated otherwise, regardless
> > > of whether attachments are marked as such.
> > > On Wed, Apr 29, 2015 at 3:41 PM, Steven White <swhite4141@gmail.com>
> > > wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > Looking at Solr's schema.xml, there are three kind of analyzers:
> > > analyzer,
> > > > indexAnalyzer and queryAnalyzer.  I have two questions about them:
> > > >
> > > > 1) If the content of indexAnalyzer and queryAnalyzer are exactly the
> > > same,
> > > > that's the same as if I have an analyzer only, right?
> > > >
> > > > 2) Under the hood, all three are the same thing when it comes to what
> > > kind
> > > > of data and configuration attributes can take, right?
> > > >
> > > > What I'm trying to figure out is this: beside being able to
> configure a
> > > > fieldType to have different analyzer setting at index and query time,
> > > there
> > > > is nothing else that's unique about each.
> > > >
> > > > Thanks
> > > >
> > > > Steve
> > > >
> > >
> >
>
>
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> LLC | 240.476.9983 | http://www.opensourceconnections.com
> Author: Taming Search <http://manning.com/turnbull> from Manning
> Publications
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message