Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: <56C6DF56.9060005@helsinki.fi>
References: <1455559819167-4257420.post@n3.nabble.com>
	<CALGoTy31VEsTKH4A7UfLww-Odpq2L2nu+gdrOS1JBdrohycsYg@mail.gmail.com>
	<1455599921170-4257510.post@n3.nabble.com>
	<56C2D64F.9040801@sematext.com>
	<CAB-fSbwR6=yJiZO_jPHw=XD_W4puiyk+0GSg51uJVfKEGk8Ydw@mail.gmail.com>
	<CAG3m1h5XqZPq6yXH4D-mXub=kpdZpO0iHQ=+k7AmjFyAWZ1Fxw@mail.gmail.com>
	<CAB-fSbz-Dx1d=nRkwfFMiR94E76Umb-nf7SSP59Ry_AmmYECVQ@mail.gmail.com>
	<CAG3m1h4JVHebQa5pN7oqFuszXVWeydgZ1L4FNFgqrdfsrAJ9uw@mail.gmail.com>
	<CALGoTy1tHewrC7zwCNJ6O7bndENYs8qGQqwWA_qWqf5znNsOZA@mail.gmail.com>
	<CAB-fSbwXU2N7h7TLb69cj9VUmSzHYvpTGBjv=DcQQCxnnDTX6Q@mail.gmail.com>
	<1455694681103-4257782.post@n3.nabble.com>
	<56C5A263.6080708@sematext.com>
	<CALGoTy0MAr3hrGps0aVkiaxhT3hi_gw8+-C=NbWpAAtsSDPgQg@mail.gmail.com>
	<CAB-fSbwZyCsa4iGzsFu0mY7r_ER7xwKVVD86Ae6bD51wVn-iOQ@mail.gmail.com>
	<CALGoTy3PrnhgVCbhDW_38bs3MfWgT7DusgbTTi1+VgG3DDSw7Q@mail.gmail.com>
	<56C6DF56.9060005@helsinki.fi>
Date: Fri, 19 Feb 2016 10:25:49 +0000
Message-ID: 
 <CAB-fSbzTUeJKsgLOiT_B5Sg3+RZnqo=BDoBHgTq1Xm67gNdKwQ@mail.gmail.com>
Subject: Re: SOLR ranking
From: Alessandro Benedetti <abenedetti@apache.org>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Content-Type: multipart/alternative; boundary=001a113cc1142fe6e5052c1ce863

--001a113cc1142fe6e5052c1ce863
Content-Type: text/plain; charset=UTF-8

Ok Binoy, now it is clearer :)
Yes, if add sorting and faceting as additional optional requirements, doing
2 queries could be a perilous path !

Cheers

On 19 February 2016 at 09:24, Ere Maijala <ere.maijala@helsinki.fi> wrote:

> If he needs faceting or something (I didn't see that specified), doing two
> queries won't do, of course..
>
> --Ere
>
>
> 19.2.2016, 2.22, Binoy Dalal kirjoitti:
>
>> Hi Alessandro,
>> Don't get me wrong. Using mm, ps and pf can and absolutely will solve his
>> problem.
>>
>> Like I said above, my solution is meant to be a quick and dirty fix. It's
>> really not that complex and shouldn't take more than an hour to setup at
>> the app level. Moreover I suggested it because he said it was urgent for
>> him and setting up a proper config with mm, pf and ps might take him much
>> longer.
>>
>> Hope this clears things up :)
>>
>> On Fri, 19 Feb 2016, 05:31 Alessandro Benedetti <abenedetti@apache.org>
>> wrote:
>>
>> Hey Binoi ,
>>> can't understand why such complexity to be honest :/
>>> Can you explain me why playing with :
>>>
>>> edismax
>>> mm ( percentage of query terms you want to be in the results)
>>> pf ( the fields you want to be boosted if phrase matches )
>>> ps ( slop to allow)
>>>
>>> Should not solve the problem instead of the 2 phases query ?
>>>
>>> Cheers
>>>
>>> On 18 February 2016 at 18:09, Binoy Dalal <binoydalal93@gmail.com>
>>> wrote:
>>>
>>> Here's an alternative solution that may be of some help.
>>>> Here I'm assuming that you are not directly outputting the search
>>>> results
>>>> to the user and have some sort of layer between the results from solr
>>>> and
>>>> presentation to the user where some additional processing can be
>>>>
>>> performed.
>>>
>>>>
>>>> 1) You already know that you want phrase matches to show up higher than
>>>> single matches. In this case, why not do an explicit phrase match first,
>>>> with some slop or as is based on how close you want the phrase terms be
>>>>
>>> to
>>>
>>>> each other.
>>>> 2) Once you have the results from the first query, fire an OR query with
>>>> your terms and get those results.
>>>> 3) Put results from (2) after (1) and present to the user. This happens
>>>>
>>> in
>>>
>>>> the app layer.
>>>>
>>>> This is essentially the same as running a query as such: "Rheumatoid
>>>> Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to
>>>> worry
>>>> about the ordering because you're sorting your results.
>>>>
>>>> Now, this will obviously take more time since you're querying twice and
>>>> then doing the addtional processing in the app layer, but provided your
>>>> architecture is balanced enough and can cope with a little extra load, I
>>>>
>>> do
>>>
>>>> not think that your performance will take that bad a hit. Moreover since
>>>> you're in a hurry, you could implement this as a quick and dirty
>>>> solution
>>>> to meet the project goals, provided it fits the acceptance parameters
>>>> and
>>>> then later play around with the scoring/sorting and figure out the best
>>>> possible setup to suit your needs.
>>>>
>>>> On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
>>>> emir.arnautovic@sematext.com> wrote:
>>>>
>>>> Hi Nitin,
>>>>> Can you send us how your parsed query looks like (from debug output).
>>>>>
>>>>> Thanks,
>>>>> Emir
>>>>>
>>>>> On 17.02.2016 08:38, Nitin.K wrote:
>>>>>
>>>>>> Hi Binoy,
>>>>>>
>>>>>> We are searching for both phrases and individual words
>>>>>> but we want that only those documents which are having phrases will
>>>>>>
>>>>> come
>>>>
>>>>> first in the order and then the individual app.
>>>>>>
>>>>>> termPositions = true is also not working in my case.
>>>>>>
>>>>>> I have also removed the string type from copy fields. kindly look
>>>>>>
>>>>> into
>>>
>>>> the
>>>>>
>>>>>> changed configuration below:
>>>>>>
>>>>>> Hi Emir,
>>>>>>
>>>>>> I have changed the cofiguration as per your suggestion, added pf2 /
>>>>>>
>>>>> pf3.
>>>>
>>>>> Yes, i saw the difference but still the ranking is not getting
>>>>>>
>>>>> followed
>>>
>>>> correctly in case of phrases.
>>>>>>
>>>>>> Changed configuration;
>>>>>>
>>>>>> <field name="topic_title" type="text_general" indexed="true"
>>>>>>
>>>>> stored="true"
>>>>>
>>>>>> />
>>>>>> <field name="topTitle" type="text_phrase" indexed="true"
>>>>>>
>>>>> stored="false"
>>>
>>>> />
>>>>>
>>>>>>
>>>>>> <field name="subtopic_title" type="text_general" indexed="true"
>>>>>> stored="true"/>
>>>>>> <field name="subTopTitle" type="text_phrase" indexed="true"
>>>>>>
>>>>> stored="false"/>
>>>>>
>>>>>>
>>>>>> <field name="index_term" type="text_ws" indexed="true" stored="true"
>>>>>> multiValued="true"/>
>>>>>> <field name="indTerm" type="text_phrase" indexed="true"
>>>>>>
>>>>> stored="false"
>>>
>>>> multiValued="true"/>
>>>>>>
>>>>>> <field name="drug" type="text_ws" indexed="true" stored="true"
>>>>>> multiValued="true"/>
>>>>>> <field name="drugString" type="text_phrase" indexed="true"
>>>>>>
>>>>> stored="false"
>>>>
>>>>> multiValued="true"/>
>>>>>>
>>>>>> <field name="tglData" type="text_phrase" indexed="true"
>>>>>>
>>>>> stored="false"/>
>>>>
>>>>>
>>>>>> Copy fields again for the reference :
>>>>>>
>>>>>> <copyField source="topic_title" dest="topTitle"/>
>>>>>> <copyField source="subtopic_title" dest="subTopTitle"/>
>>>>>> <copyField source="index_term" dest="indTerm"/>
>>>>>> <copyField source="drug" dest="drugString"/>
>>>>>> <copyField source="content" dest="tglData"/>
>>>>>>
>>>>>> Added following field type:
>>>>>>
>>>>>> <fieldType name="text_phrase" class="solr.TextField"
>>>>>> positionIncrementGap="100" omitNorms="true">
>>>>>>        <analyzer>
>>>>>>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>                <filter class="solr.StopFilterFactory"
>>>>>>
>>>>> ignoreCase="true"
>>>
>>>> words="stopwords.txt" />
>>>>>>                <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>        </analyzer>
>>>>>> </fieldType>
>>>>>>
>>>>>> Removed the string type from the copy fields.
>>>>>>
>>>>>> Changed Query :
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>> http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&
>>>
>>>> pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
>>>>>> pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
>>>>>> pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
>>>>>> qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3
>>>>>>
>>>>>> After making these changes, I am able to get my search results
>>>>>>
>>>>> correctly
>>>>
>>>>> for
>>>>>
>>>>>> a single term but in case of phrase search, i am still not able to
>>>>>>
>>>>> get
>>>
>>>> the
>>>>>
>>>>>> results in the correct order.
>>>>>>
>>>>>> Hi Modassar,
>>>>>>
>>>>>> I tried using mm=100, but the order is still the same.
>>>>>>
>>>>>> Hi Alessandro,
>>>>>>
>>>>>> I have not yet tried the slope parameter. By default it is taking it
>>>>>>
>>>>> as
>>>
>>>> 1.0
>>>>>
>>>>>> when i looked it in debug mode. Will revert you definitely. So, let
>>>>>>
>>>>> me
>>>
>>>> try
>>>>>
>>>>>> this option too.
>>>>>>
>>>>>> All,
>>>>>>
>>>>>> Please suggest if anyone is having any other suggestion on this. I
>>>>>>
>>>>> have
>>>
>>>> to
>>>>>
>>>>>> implement it on urgent basis and i think i am very close to it.
>>>>>>
>>>>> Thanks
>>>
>>>> all
>>>>>
>>>>>> of you. I have reached to this level just because of you guys.
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Nitin
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>>
>>>>> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html
>>>>>
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>
>>>>> --
>>>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>>
>>>>> --
>>>>>
>>>> Regards,
>>>> Binoy Dalal
>>>>
>>>>
>>>
>>>
>>> --
>>> --------------------------
>>>
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>>>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

--001a113cc1142fe6e5052c1ce863--