lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Turnbull <dturnb...@opensourceconnections.com>
Subject Re: When is too many fields in "qf" is too many?
Date Tue, 26 May 2015 18:01:12 GMT
How you have tie is fine. Setting tie to 1 might give you reasonable
results. You could easily still have scores that are just always an order
of magnitude or two higher, but try it out!

BTW Anything you put in teh URL can also be put into a request handler.

If you ever just want to have a 15 minute conversation via hangout, happy
to chat with you :) Might be fun to think through your prob together.

-Doug

On Tue, May 26, 2015 at 1:42 PM, Steven White <swhite4141@gmail.com> wrote:

> Hi Doug,
>
> I'm back to this topic.  Unfortunately, due to my DB structer, and business
> need, I will not be able to search against a single field (i.e.: using
> copyField).  Thus, I have to use list of fields via "qf".  Given this, I
> see you said above to use "tie=1.0" will that, more or less, address this
> scoring issue?  Should "tie=1.0" be set on the request handler like so:
>
>   <requestHandler name="/select" class="solr.SearchHandler">
>      <lst name="defaults">
>        <str name="echoParams">explicit</str>
>        <int name="rows">20</int>
>        <str name="defType">edismax</str>
>        <str name="qf">F1 F2 F3 F4 ... ... ...</str>
>        <float name="tie">1.0</float>
>        <str name="fl">_UNIQUE_FIELD_,score</str>
>        <str name="wt">xml</str>
>        <str name="indent">true</str>
>      </lst>
>   </requestHandler>
>
> Or must "tie" be passed as part of the URL?
>
> Thanks
>
> Steve
>
>
> On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull <
> dturnbull@opensourceconnections.com> wrote:
>
> > Yeah a copyField into one could be a good space/time tradeoff. It can be
> > more manageable to use an all field for both relevancy and performance,
> if
> > you can handle the duplication of data.
> >
> > You could set tie=1.0, which effectively sums all the matches instead of
> > picking the best match. You'll still have cases where one field's score
> > might just happen to be far off of another, and thus dominating the
> > summation. But something easy to try if you want to keep playing with
> > dismax.
> >
> > -Doug
> >
> > On Wed, May 20, 2015 at 2:56 PM, Steven White <swhite4141@gmail.com>
> > wrote:
> >
> > > Hi Doug,
> > >
> > > Your blog write up on relevancy is very interesting, I didn't know
> this.
> > > Looks like I have to go back to my drawing board and figure out an
> > > alternative solution: somehow get those group-based-fields data into a
> > > single field using copyField.
> > >
> > > Thanks
> > >
> > > Steve
> > >
> > > On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull <
> > > dturnbull@opensourceconnections.com> wrote:
> > >
> > > > Steven,
> > > >
> > > > I'd be concerned about your relevance with that many qf fields.
> Dismax
> > > > takes a "winner takes all" point of view to search. Field scores can
> > vary
> > > > by an order of magnitude (or even two) despite the attempts of query
> > > > normalization. You can read more here
> > > >
> > > >
> > >
> >
> http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/
> > > >
> > > > I'm about to win the "blashphemer" merit badge, but ad-hoc all-field
> > like
> > > > searching over many fields is actually a good use case for
> > > Elasticsearch's
> > > > cross field queries.
> > > >
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html
> > > >
> > > >
> > >
> >
> http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/
> > > >
> > > > It wouldn't be hard (and actually a great feature for the project) to
> > get
> > > > the Lucene query associated with cross field search into Solr. You
> > could
> > > > easily write a plugin to integrate it into a query parser:
> > > >
> > > >
> > >
> >
> https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java
> > > >
> > > > Hope that helps
> > > > -Doug
> > > > --
> > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> > Connections,
> > > > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > > > Author: Relevant Search <http://manning.com/turnbull> from Manning
> > > > Publications
> > > > This e-mail and all contents, including attachments, is considered to
> > be
> > > > Company Confidential unless explicitly stated otherwise, regardless
> > > > of whether attachments are marked as such.
> > > > On Wed, May 20, 2015 at 8:27 AM, Steven White <swhite4141@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > My solution requires that users in group-A can only search against
> a
> > > set
> > > > of
> > > > > fields-A and users in group-B can only search against a set of
> > > fields-B,
> > > > > etc.  There can be several groups, as many as 100 even more.  To
> meet
> > > > this
> > > > > need, I build my search by passing in the list of fields via "qf".
> > > What
> > > > > goes into "qf" can be large: as many as 1500 fields and each field
> > name
> > > > > averages 15 characters long, in effect the data passed via "qf"
> will
> > be
> > > > > over 20K characters.
> > > > >
> > > > > Given the above, beside the fact that a search for "apple"
> > translating
> > > > to a
> > > > > 20K characters passing over the network, what else within Solr and
> > > > Lucene I
> > > > > should be worried about if any?  Will I hit some kind of a limit?
> > Will
> > > > > each search now require more CPU cycles?  Memory?  Etc.
> > > > >
> > > > > If the network traffic becomes an issue, my alternative solution
is
> > to
> > > > > create a /select handler for each group and in that handler list
> the
> > > > fields
> > > > > under "qf".
> > > > >
> > > > > I have considered creating pseudo-fields for each group and then
> use
> > > > > copyField into that group.  During search, I than can "qf" against
> > that
> > > > one
> > > > > field.  Unfortunately, this is not ideal for my solution because
> the
> > > > fields
> > > > > that go into each group dynamically change (at least once a month)
> > and
> > > > when
> > > > > they do change, I have to re-index everything (this I have to
> avoid)
> > to
> > > > > sync that group-field.
> > > > >
> > > > > I'm using "qf" with edismax and my Solr version is 5.1.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Steve
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > Author: Relevant Search <http://manning.com/turnbull> from Manning
> > Publications
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
> >
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search <http://manning.com/turnbull> from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message