lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: keywords not found - google like feature
Date Thu, 13 Apr 2017 20:31:58 GMT
bq:  he searches he wants to know what keywords were not found in results.

We need to distinguish between words not found in the returned
documents and words not found at all. The solutions above tell you
about documents returned. If the keyword was found in a document not
returned (say the 11th doc and you have rows set to 10) you'd have no
way to know that the keyword was actually in _some_ document just not
one of the top N returned.

So if your question is really "I want to know what terms were not
found in any document", they won't help.

Another rather ugly solution would be to facet on the keywords. You'd
add some facet clauses like:
facet.query=keywordfield:keyword1&
facet.query=keywordfield:keyword2&
facet.query=keywordfield:keyword3&
facet.query=keywordfield:keyword4

The word counts in those returned facets would represent the total
number of documents having that keyword, regardless of whether they
were in the top N returned. For a bazillion docs this is probably
unworkable I admit.

Do _not_ facet on keywordfield as in &facet.field=keyword unless you
are certain it has a pretty low cardinality, as in maybe 100 or so.
Beyond that test. Faceting on a field with a million unique values
corpus-wide is just asking for trouble.

Best,
Erick

On Thu, Apr 13, 2017 at 1:12 PM, Markus Jelsma
<markus.jelsma@openindex.io> wrote:
> Hi - That is not going to be that easy out-of-the-box. In regular setups the output you
find in debugging mode contains stemmed versions of the original input text.
>
> At best you use KeepWordsFilterFactory to get unstemmed terms, but those tokens would,
in usual cases, also have passed through filters such as LowerCase, AsciiFolding or some language
specific normalizer. Causing them not to match most original input tokens.
>
> Regards,
> Markus
>
>
>
> -----Original message-----
>> From:David Hastings <hastings.recursive@gmail.com>
>> Sent: Thursday 13th April 2017 22:05
>> To: solr-user@lucene.apache.org
>> Subject: Re: keywords not found - google like feature
>>
>> Another ugly solution would be to use the debugQuery=true option, then
>> analyze the reults in explain, if the word isnt in the explain, then you
>> strike it out.
>>
>> On Thu, Apr 13, 2017 at 4:01 PM, Markus Jelsma <markus.jelsma@openindex.io>
>> wrote:
>>
>> > Hi - There is no such feature out-of-the-box in Solr. But you probably
>> > could modify a highlighter implementation to return this information, the
>> > highlighter is the component that comes closest to that feature.
>> >
>> > Regards,
>> > Markus
>> >
>> >
>> >
>> > -----Original message-----
>> > > From:Nilesh Kamani <nilesh.kamani@gmail.com>
>> > > Sent: Thursday 13th April 2017 21:52
>> > > To: solr-user@lucene.apache.org
>> > > Subject: Re: keywords not found - google like feature
>> > >
>> > > Here is the example.
>> > > https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&
>> > espv=2&ie=UTF-8#safe=off&q=solr+spring+trump
>> > >
>> > > You will see this under search results.  Missing: trump
>> > >
>> > > I am not asking for visual representation of such feature.
>> > > Is there anyway solr is returning such info in response ?
>> > > My client has this specific requirements that when he searches he wants
>> > to
>> > > know what keywords were not found in results.
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Apr 13, 2017 at 3:34 PM, Alexandre Rafalovitch <
>> > arafalov@gmail.com>
>> > > wrote:
>> > >
>> > > > Are you asking visual representation or an actual feature. Because
if
>> > > > all your keywords/clauses are optional (default SHOULD) then Solr
>> > > > automatically tries to match maximum number of them and then less
and
>> > > > less. So, if all words do not match, it will return results that match
>> > > > less number of words.
>> > > >
>> > > > And words not-matched is effectively your strike-through negative
>> > > > space. You can probably recover that from debug info, though it will
>> > > > be not pretty and perhaps a bit slower.
>> > > >
>> > > > The real issue here is ranking. Does Google do something special with
>> > > > ranking when they do strike through. Do they do some grouping and
>> > > > ranking within groups, not just a global one?
>> > > >
>> > > > The biggest question is - of course - what is your business - as
>> > > > opposed to look-alike - objective. Because explaining your needs
>> > > > through a similarity with other product's secret implementation is
a
>> > > > long way to get there. Too much precision loss in each explanation
>> > > > round.
>> > > >
>> > > > Regards,
>> > > >    Alex.
>> > > > ----
>> > > > http://www.solr-start.com/ - Resources for Solr users, new and
>> > experienced
>> > > >
>> > > >
>> > > > On 13 April 2017 at 20:49, Nilesh Kamani <nilesh.kamani@gmail.com>
>> > wrote:
>> > > > > Hello All,
>> > > > >
>> > > > > When we search google, sometimes google returns results with
mention
>> > of
>> > > > > keywords not found (mentioned as strike-through)
>> > > > >
>> > > > > Does Solr provide such feature ?
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > > Nilesh Kamani
>> > > >
>> > >
>> >
>>

Mime
View raw message