lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alistair Young <alistair.yo...@uhi.ac.uk>
Subject Re: phrase matches returning near matches
Date Tue, 16 Jun 2015 15:25:13 GMT
yes prolly not a bug. The highlighting is on but nothing is highlighted.
Perhaps this text is triggering it?

'consider the impacts of land management changes’

that would seem reasonable. It’s not a direct match so no highlighting
(the highlighting does work on a direct match) but 'management changes’
must be near enough ‘manage change’ to trigger a result.

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 16:18, "Erick Erickson" <erickerickson@gmail.com> wrote:

>I agree with Allesandro the behavior you're describing
>is _not_ correct at all given your description. So either
>
>1> There's something "interesting" about your configuration
>      that doesn't seem important that you haven't told us,
>      although what it could be is a mystery to me  too ;)
>
>2> it's matching on something else. Note that the
>     phrase has been stemmed, so something in there
>     besides management might stem to manag and/or
>    something other than changes might stem to chang
>    and the two of _them_ happen to be next to each
>    other. "are managers changing?" for instance. Or
>    even something less likely. Perhaps turn on
>    highlighting and see if it pops out?
>
>
>3> you've uncovered a bug. Although I suspect others
>    would have reported it and the unit tests would have
>    barfed all over the place.
>
>One other thing you can do. Go to the admin/analysis
>page and turn on the "verbose" check box. Put
>management is undergoing many changes
>in both the query and index boxes. The result (it's
>kind of hard to read I'll admit) will include the position
>of each token after all the analysis is done. Phrase
>queries (without slop) should only be matching adjacent
>positions. So the question is whether the position info
>"looks correct"....
>
>Best,
>Erick
>
>On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
><benedetti.alex85@gmail.com> wrote:
>> According to your debug you are using a default Lucene Query Parser.
>> This surprise me as i would expect with that query a match with
>>distance 0
>> between the 2 terms .
>>
>> Are you sure nothing else is that field that matches the phrase query ?
>>
>> From the documentation
>>
>> "Lucene supports finding words are a within a specific distance away.
>>To do
>> a proximity search use the tilde, "~", symbol at the end of a Phrase.
>>For
>> example to search for a "apache" and "jakarta" within 10 words of each
>> other in a document use the search:
>>
>> "jakarta apache"~10 "
>>
>>
>> Cheers
>>
>>
>> 2015-06-16 11:33 GMT+01:00 Alistair Young <alistair.young@uhi.ac.uk>:
>>
>>> it¹s a useful behaviour. I¹d just like to understand where it¹s
>>>deciding
>>> the document is relevant. debug output is:
>>>
>>> <lst name="debug">
>>>   <str name="rawquerystring">dc.description:"manage change"</str>
>>>   <str name="querystring">dc.description:"manage change"</str>
>>>   <str name="parsedquery">PhraseQuery(dc.description:"manag
>>>chang")</str>
>>>   <str name="parsedquery_toString">dc.description:"manag chang"</str>
>>>   <lst name="explain">
>>>     <str name="tst:test">
>>> 1.2008798 = (MATCH) weight(dc.description:"manag chang" in 221)
>>> [DefaultSimilarity], result of:
>>>   1.2008798 = fieldWeight in 221, product of:
>>>     1.0 = tf(freq=1.0), with freq of:
>>>       1.0 = phraseFreq=1.0
>>>     9.6070385 = idf(), sum of:
>>>       4.0365543 = idf(docFreq=101, maxDocs=2125)
>>>       5.5704846 = idf(docFreq=21, maxDocs=2125)
>>>     0.125 = fieldNorm(doc=221)
>>> </str>
>>>   </lst>
>>>   <str name="QParser">LuceneQParser</str>
>>>   <lst name="timing">
>>>     <double name="time">41.0</double>
>>>     <lst name="prepare">
>>>       <double name="time">3.0</double>
>>>       <lst name="query">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="facet">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="mlt">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="highlight">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="stats">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="debug">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>     </lst>
>>>     <lst name="process">
>>>       <double name="time">35.0</double>
>>>       <lst name="query">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="facet">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="mlt">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="highlight">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="stats">
>>>         <double name="time">0.0</double>
>>>       </lst>
>>>       <lst name="debug">
>>>         <double name="time">35.0</double>
>>>       </lst>
>>>     </lst>
>>>   </lst>
>>> </lst>
>>>
>>>
>>> thanks,
>>>
>>> Alistair
>>>
>>> --
>>> mov eax,1
>>> mov ebx,0
>>> int 80h
>>>
>>>
>>>
>>>
>>> On 16/06/2015 11:26, "Alessandro Benedetti"
>>><benedetti.alex85@gmail.com>
>>> wrote:
>>>
>>> >Can you show us how the query is parsed ?
>>> >You didn't tell us nothing about the query parser you are using.
>>> >Enable the debugQuery=true will show you how the query is parsed and
>>>this
>>> >will be quite useful for us.
>>> >
>>> >
>>> >Cheers
>>> >
>>> >2015-06-16 11:22 GMT+01:00 Alistair Young <alistair.young@uhi.ac.uk>:
>>> >
>>> >> Hiya,
>>> >>
>>> >> I've been looking for documentation that would point to where I
>>>could
>>> >> modify or explain why 'near neighbours' are returned from a phrase
>>> >>search.
>>> >> If I search for:
>>> >>
>>> >> "manage change"
>>> >>
>>> >> I get back a document that contains "this will help in your
>>>management
>>> >>of
>>> >> <lots more words...> changes". It's relevant but I'd like to
>>>understand
>>> >>why
>>> >> solr is returning it. Is it a combination of fuzzy/slop? The
>>>distance
>>> >> between the two variations of the two words in the document is quite
>>> >>large.
>>> >>
>>> >> thanks,
>>> >>
>>> >> Alistair
>>> >>
>>> >> --
>>> >> mov eax,1
>>> >> mov ebx,0
>>> >> int 80h
>>> >>
>>> >
>>> >
>>> >
>>> >--
>>> >--------------------------
>>> >
>>> >Benedetti Alessandro
>>> >Visiting card : http://about.me/alessandro_benedetti
>>> >
>>> >"Tyger, tyger burning bright
>>> >In the forests of the night,
>>> >What immortal hand or eye
>>> >Could frame thy fearful symmetry?"
>>> >
>>> >William Blake - Songs of Experience -1794 England
>>>
>>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England

Mime
View raw message