lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Binoy Dalal <binoydala...@gmail.com>
Subject Re: SOLR ranking
Date Thu, 18 Feb 2016 18:09:25 GMT
Here's an alternative solution that may be of some help.
Here I'm assuming that you are not directly outputting the search results
to the user and have some sort of layer between the results from solr and
presentation to the user where some additional processing can be performed.

1) You already know that you want phrase matches to show up higher than
single matches. In this case, why not do an explicit phrase match first,
with some slop or as is based on how close you want the phrase terms be to
each other.
2) Once you have the results from the first query, fire an OR query with
your terms and get those results.
3) Put results from (2) after (1) and present to the user. This happens in
the app layer.

This is essentially the same as running a query as such: "Rheumatoid
Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry
about the ordering because you're sorting your results.

Now, this will obviously take more time since you're querying twice and
then doing the addtional processing in the app layer, but provided your
architecture is balanced enough and can cope with a little extra load, I do
not think that your performance will take that bad a hit. Moreover since
you're in a hurry, you could implement this as a quick and dirty solution
to meet the project goals, provided it fits the acceptance parameters and
then later play around with the scoring/sorting and figure out the best
possible setup to suit your needs.

On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> Hi Nitin,
> Can you send us how your parsed query looks like (from debug output).
>
> Thanks,
> Emir
>
> On 17.02.2016 08:38, Nitin.K wrote:
> > Hi Binoy,
> >
> > We are searching for both phrases and individual words
> > but we want that only those documents which are having phrases will come
> > first in the order and then the individual app.
> >
> > termPositions = true is also not working in my case.
> >
> > I have also removed the string type from copy fields. kindly look into
> the
> > changed configuration below:
> >
> > Hi Emir,
> >
> > I have changed the cofiguration as per your suggestion, added pf2 / pf3.
> > Yes, i saw the difference but still the ranking is not getting followed
> > correctly in case of phrases.
> >
> > Changed configuration;
> >
> > <field name="topic_title" type="text_general" indexed="true"
> stored="true"
> > />
> > <field name="topTitle" type="text_phrase" indexed="true" stored="false"
> />
> >
> > <field name="subtopic_title" type="text_general" indexed="true"
> > stored="true"/>
> > <field name="subTopTitle" type="text_phrase" indexed="true"
> stored="false"/>
> >
> > <field name="index_term" type="text_ws" indexed="true" stored="true"
> > multiValued="true"/>
> > <field name="indTerm" type="text_phrase" indexed="true" stored="false"
> > multiValued="true"/>
> >
> > <field name="drug" type="text_ws" indexed="true" stored="true"
> > multiValued="true"/>
> > <field name="drugString" type="text_phrase" indexed="true" stored="false"
> > multiValued="true"/>
> >
> > <field name="tglData" type="text_phrase" indexed="true" stored="false"/>
> >
> > Copy fields again for the reference :
> >
> > <copyField source="topic_title" dest="topTitle"/>
> > <copyField source="subtopic_title" dest="subTopTitle"/>
> > <copyField source="index_term" dest="indTerm"/>
> > <copyField source="drug" dest="drugString"/>
> > <copyField source="content" dest="tglData"/>
> >
> > Added following field type:
> >
> > <fieldType name="text_phrase" class="solr.TextField"
> > positionIncrementGap="100" omitNorms="true">
> >       <analyzer>
> >               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >               <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >               <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> > </fieldType>
> >
> > Removed the string type from the copy fields.
> >
> > Changed Query :
> >
> >
> http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&
> > pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
> > qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3
> >
> > After making these changes, I am able to get my search results correctly
> for
> > a single term but in case of phrase search, i am still not able to get
> the
> > results in the correct order.
> >
> > Hi Modassar,
> >
> > I tried using mm=100, but the order is still the same.
> >
> > Hi Alessandro,
> >
> > I have not yet tried the slope parameter. By default it is taking it as
> 1.0
> > when i looked it in debug mode. Will revert you definitely. So, let me
> try
> > this option too.
> >
> > All,
> >
> > Please suggest if anyone is having any other suggestion on this. I have
> to
> > implement it on urgent basis and i think i am very close to it. Thanks
> all
> > of you. I have reached to this level just because of you guys.
> >
> > Thanks and Regards,
> > Nitin
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
> --
Regards,
Binoy Dalal

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message