Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7B6C3200D56 for ; Tue, 28 Nov 2017 02:03:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 79C7D160C14; Tue, 28 Nov 2017 01:03:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BF23F160C13 for ; Tue, 28 Nov 2017 02:03:06 +0100 (CET) Received: (qmail 23531 invoked by uid 500); 28 Nov 2017 01:03:05 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 23521 invoked by uid 99); 28 Nov 2017 01:03:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Nov 2017 01:03:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BECD51A1340 for ; Tue, 28 Nov 2017 01:03:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ro9-YPXqFO9M for ; Tue, 28 Nov 2017 01:03:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B5E785F216 for ; Tue, 28 Nov 2017 01:03:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 071DBE0F1E for ; Tue, 28 Nov 2017 01:03:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 876ED241C4 for ; Tue, 28 Nov 2017 01:03:00 +0000 (UTC) Date: Tue, 28 Nov 2017 01:03:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SOLR-11662) Make overlapping query term scoring configurable per field type MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 28 Nov 2017 01:03:07 -0000 [ https://issues.apache.org/jira/browse/SOLR-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267900#comment-16267900 ] ASF GitHub Bot commented on SOLR-11662: --------------------------------------- Github user softwaredoug commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/275#discussion_r153369232 --- Diff: solr/core/src/java/org/apache/solr/schema/FieldType.java --- @@ -905,6 +905,7 @@ protected void checkSupportsDocValues() { protected static final String ENABLE_GRAPH_QUERIES = "enableGraphQueries"; private static final String ARGS = "args"; private static final String POSITION_INCREMENT_GAP = "positionIncrementGap"; + protected static final String SCORE_OVERLAPS = "scoreOverlaps"; --- End diff -- I have been thinking a lot about this! - Solr currently exposes per-field query configuration as a fieldType param, not query time (see [autoGeneratePhraseQueries and enableGraphQueries](https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html#general-properties). - Solr doesn't yet have a way to pass per-field configuration at query time (my email about multiple analyzers proposes one system for doing this) To do the latter, ideally you'd have an API that could let you see multiple views/configs on the same field, such as the following which would search two query-time versions of the actor field `q=action movies&qf=actor_syn actor_nosyn^10 title text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms&qf.actor_syn&scoreOverlaps=pick_best` I think this sort of syntax could be extremely powerful, and deal with the ability to configure multiple query time analyzers. But a bridge too far for this PR... > Make overlapping query term scoring configurable per field type > --------------------------------------------------------------- > > Key: SOLR-11662 > URL: https://issues.apache.org/jira/browse/SOLR-11662 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Doug Turnbull > Fix For: 7.2, master (8.0) > > > This patch customizes the query-time behavior when query terms overlap positions. Right now the only option is SynonymQuery. This is a fantastic default & improvement on past versions. However, there are use cases where terms overlap positions but don't carry exact synonymy relationships. Often synonyms are actually used to model hypernym/hyponym relationships using synonyms (or other analyzers). So the individual term scores matter, with terms with higher specificity (hyponym) scoring higher than terms with lower specificity (hypernym). > This patch adds the fieldType setting scoreOverlaps, as in: > {code:java} > > {code} > Valid values for scoreOverlaps are: > *as_one_term* > Default, most synonym use cases. Uses SynonymQuery > Treats all terms as if they're exactly equivalent, with document frequency from underlying terms blended > *pick_best* > For a given document, score using the best scoring synonym (ie dismax over generated terms). > Useful when synonyms not exactly equilevant. Instead they are used to model hypernym/hyponym relationships. Such as expanding to synonyms of where terms scores will reflect that quality > IE this query time expansion > tabby => tabby, cat, animal > Searching "text", generates the dismax (text:tabby | text:cat | text:animal) > *as_distinct_terms* > (The pre 6.0 behavior.) > Compromise between pick_best and as_oneSterm > Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets scores stack, so documents with more tabby, cat, or animal the better w/ a bias towards the term with highest specificity > Terms are turned into a boolean OR query, with documen frequencies not blended > IE this query time expansion > tabby => tabby, cat, animal > Searching "text", generates the boolean query (text:tabby text:cat text:animal) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org