lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum Gupta <ans...@anshumgupta.net>
Subject Re: [More Like This] Query building
Date Tue, 29 Dec 2015 12:43:12 GMT
Feel free to create a JIRA and put up a patch if you can.

On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <abenedetti@apache.org
> wrote:

> Hi guys,
> While I was exploring the way we build the More Like This query, I
> discovered a part I am not convinced of :
>
>
>
> Let's see how we build the query :
> org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>
> 1) we extract the terms from the interesting fields, adding them to a map :
>
> Map<String, Int> termFreqMap = new HashMap<>();
>
> *( we lose the relation field-> term, we don't know anymore where the term
> was coming ! )*
>
> org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>
> 2) we build the queue that will contain the query terms, at this point we
> connect again there terms to some field, but :
>
> ...
>> // go through all the fields and find the largest document frequency
>> String topField = fieldNames[0];
>> int docFreq = 0;
>> for (String fieldName : fieldNames) {
>>   int freq = ir.docFreq(new Term(fieldName, word));
>>   topField = (freq > docFreq) ? fieldName : topField;
>>   docFreq = (freq > docFreq) ? freq : docFreq;
>> }
>> ...
>
>
> We identify the topField as the field with the highest document frequency
> for the term t .
> Then we build the termQuery :
>
> queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>
> In this way we lose a lot of precision.
> Not sure why we do that.
> I would prefer to keep the relation between terms and fields.
> The MLT query can improve a lot the quality.
> If i run the MLT on 2 fields : *description* and *facilities* for example.
> It is likely I want to find documents with similar terms in the
> description and similar terms in the facilities, without mixing up the
> things and loosing the semantic of the terms.
>
> Let me know your opinion,
>
> Cheers
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Anshum Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message