Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E1AC7200C55 for ; Thu, 13 Apr 2017 22:32:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E0324160B98; Thu, 13 Apr 2017 20:32:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0A69F160B89 for ; Thu, 13 Apr 2017 22:32:45 +0200 (CEST) Received: (qmail 49577 invoked by uid 500); 13 Apr 2017 20:32:44 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 49564 invoked by uid 99); 13 Apr 2017 20:32:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2017 20:32:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6A50DC674A for ; Thu, 13 Apr 2017 20:32:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.38 X-Spam-Level: X-Spam-Status: No, score=0.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id GkcmlZ8LnOEE for ; Thu, 13 Apr 2017 20:32:41 +0000 (UTC) Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com [209.85.215.52]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C1A205FBA4 for ; Thu, 13 Apr 2017 20:32:40 +0000 (UTC) Received: by mail-lf0-f52.google.com with SMTP id 75so34902700lfs.2 for ; Thu, 13 Apr 2017 13:32:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=LwhF190IHGXI7S6cMDthk30RoNvW784XUSL695d4b5g=; b=pYwkWpYdEl8cM77LQ3sA78nLq6uBKec3QXG0WPEO8JYFqhxnx93ziSJ/TCuQ/Ikure ZLPxbvGkJ75CvS10e9CSQPYUUIOYPLgx8E5GjO7uPfVlQMR6k5sbUNoHN7zr/gKXNnWi dHD2gSpWdT6QfMzM5L0YFtwTyo4kn/svucp9gV7d78xlkBjG6BVvJRyfP27kYLPAzG7R 3pZII4ceb0omJirY9476m8e3guOEGc0yRqiWbkwTj4N/v2UlDjhvrCfUgIXl01axv3sK +oZk26EDPN7I/aekqLRZnhzyqfilezR+ofGfcm+qx0FvOsXdvy4iWBATe83LPsz8opZ3 5rsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=LwhF190IHGXI7S6cMDthk30RoNvW784XUSL695d4b5g=; b=Kw+JGjovo3hw2WcOrp3hIbA4heWnx9PDAjYuEGujqZa+X0wbRt60fI/xI2nkLZUbvS tU8gJ0jjd1k+MCN4UuK+wmdjZ4DKXlptz456DDx/4SDN38LNH1H27R8uILFFBVb8YG92 3qtP9ic2hPCVDLJ3zfJ1Gmo2MiqhRUfSHXmwE4PRMDN1CqBQJQKLS52GYZRmgAn1BD5R 2NM4NzkLif9XynZhRe7Zn4pQl+TsJOiw2NVnCdnHktzpCyMtQH0CmSp5WfT71QIr1d1e NnSNVdjqiOLWK5jrOY4xu1MIXS8ngQ54BS8Qu7os8o5CuTa92NAFgBJiUb+RykA7M535 ENDQ== X-Gm-Message-State: AN3rC/5VH63C6p2nAzbp92ow25hPtCn+Ck7nVwq+7M1eU15p0+pEoJ3u jpVBVSCCQBAjdi3DYZzcBMiukkfUYAaDq9c= X-Received: by 10.25.233.195 with SMTP id j64mr1894257lfk.29.1492115559451; Thu, 13 Apr 2017 13:32:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.234.139 with HTTP; Thu, 13 Apr 2017 13:31:58 -0700 (PDT) In-Reply-To: References: From: Erick Erickson Date: Thu, 13 Apr 2017 13:31:58 -0700 Message-ID: Subject: Re: keywords not found - google like feature To: solr-user Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Thu, 13 Apr 2017 20:32:47 -0000 bq: he searches he wants to know what keywords were not found in results. We need to distinguish between words not found in the returned documents and words not found at all. The solutions above tell you about documents returned. If the keyword was found in a document not returned (say the 11th doc and you have rows set to 10) you'd have no way to know that the keyword was actually in _some_ document just not one of the top N returned. So if your question is really "I want to know what terms were not found in any document", they won't help. Another rather ugly solution would be to facet on the keywords. You'd add some facet clauses like: facet.query=3Dkeywordfield:keyword1& facet.query=3Dkeywordfield:keyword2& facet.query=3Dkeywordfield:keyword3& facet.query=3Dkeywordfield:keyword4 The word counts in those returned facets would represent the total number of documents having that keyword, regardless of whether they were in the top N returned. For a bazillion docs this is probably unworkable I admit. Do _not_ facet on keywordfield as in &facet.field=3Dkeyword unless you are certain it has a pretty low cardinality, as in maybe 100 or so. Beyond that test. Faceting on a field with a million unique values corpus-wide is just asking for trouble. Best, Erick On Thu, Apr 13, 2017 at 1:12 PM, Markus Jelsma wrote: > Hi - That is not going to be that easy out-of-the-box. In regular setups = the output you find in debugging mode contains stemmed versions of the orig= inal input text. > > At best you use KeepWordsFilterFactory to get unstemmed terms, but those = tokens would, in usual cases, also have passed through filters such as Lowe= rCase, AsciiFolding or some language specific normalizer. Causing them not = to match most original input tokens. > > Regards, > Markus > > > > -----Original message----- >> From:David Hastings >> Sent: Thursday 13th April 2017 22:05 >> To: solr-user@lucene.apache.org >> Subject: Re: keywords not found - google like feature >> >> Another ugly solution would be to use the debugQuery=3Dtrue option, then >> analyze the reults in explain, if the word isnt in the explain, then you >> strike it out. >> >> On Thu, Apr 13, 2017 at 4:01 PM, Markus Jelsma >> wrote: >> >> > Hi - There is no such feature out-of-the-box in Solr. But you probably >> > could modify a highlighter implementation to return this information, = the >> > highlighter is the component that comes closest to that feature. >> > >> > Regards, >> > Markus >> > >> > >> > >> > -----Original message----- >> > > From:Nilesh Kamani >> > > Sent: Thursday 13th April 2017 21:52 >> > > To: solr-user@lucene.apache.org >> > > Subject: Re: keywords not found - google like feature >> > > >> > > Here is the example. >> > > https://www.google.ca/webhp?sourceid=3Dchrome-instant&ion=3D1& >> > espv=3D2&ie=3DUTF-8#safe=3Doff&q=3Dsolr+spring+trump >> > > >> > > You will see this under search results. Missing: trump >> > > >> > > I am not asking for visual representation of such feature. >> > > Is there anyway solr is returning such info in response ? >> > > My client has this specific requirements that when he searches he wa= nts >> > to >> > > know what keywords were not found in results. >> > > >> > > >> > > >> > > >> > > On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch < >> > arafalov@gmail.com> >> > > wrote: >> > > >> > > > Are you asking visual representation or an actual feature. Because= if >> > > > all your keywords/clauses are optional (default SHOULD) then Solr >> > > > automatically tries to match maximum number of them and then less = and >> > > > less. So, if all words do not match, it will return results that m= atch >> > > > less number of words. >> > > > >> > > > And words not-matched is effectively your strike-through negative >> > > > space. You can probably recover that from debug info, though it wi= ll >> > > > be not pretty and perhaps a bit slower. >> > > > >> > > > The real issue here is ranking. Does Google do something special w= ith >> > > > ranking when they do strike through. Do they do some grouping and >> > > > ranking within groups, not just a global one? >> > > > >> > > > The biggest question is - of course - what is your business - as >> > > > opposed to look-alike - objective. Because explaining your needs >> > > > through a similarity with other product's secret implementation is= a >> > > > long way to get there. Too much precision loss in each explanation >> > > > round. >> > > > >> > > > Regards, >> > > > Alex. >> > > > ---- >> > > > http://www.solr-start.com/ - Resources for Solr users, new and >> > experienced >> > > > >> > > > >> > > > On 13 April 2017 at 20:49, Nilesh Kamani >> > wrote: >> > > > > Hello All, >> > > > > >> > > > > When we search google, sometimes google returns results with men= tion >> > of >> > > > > keywords not found (mentioned as strike-through) >> > > > > >> > > > > Does Solr provide such feature ? >> > > > > >> > > > > >> > > > > Thanks, >> > > > > Nilesh Kamani >> > > > >> > > >> > >>