lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Brusic <i...@brusic.com>
Subject Re: Search Ranking
Date Thu, 17 May 2012 22:52:08 GMT
If you read the explain output, you can see where the scores are
different. One difference with a noticeable affect is:

1.0 = tf(termFreq(searchText:fred)=1)
0.5 = fieldNorm(field=searchText, doc=1)
vs.
1.4142135 = tf(termFreq(searchText:fred)=2)
0.375 = fieldNorm(field=searchText, doc=0)

As predicted, the term frequencies and norms are affecting the
scoring. Try omitting  norms on the field and try your query again.

field.setOmitNorms(true) or Field.Index.ANALYZED_NO_NORMS

Cheers,

Ivan

On Wed, May 16, 2012 at 1:54 PM, Meeraj Kunnumpurath
<meeraj.kunnumpurath@asyska.com> wrote:
> Also, if I do the below
>
> Query q = new QueryParser(Version.LUCENE_35, "searchText",
> analyzer).parse("Takeaway fred@company.com^100")
>
> I get them in reverse order. Do I need to boost the term, even if it
> appears more than once in the document?
>
> Regards
> Meeraj
>
> On Wed, May 16, 2012 at 9:52 PM, Meeraj Kunnumpurath <
> meeraj.kunnumpurath@asyska.com> wrote:
>
>> This is the output I get from explaining the plan ..
>>
>>
>> Found 2 hits.
>> 1. XYZ Takeaway fred@company.com
>> 0.5148823 = (MATCH) sum of:
>>   0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of:
>>     0.57735026 = queryWeight(searchText:takeaway), product of:
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.97109574 = queryNorm
>>     0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of:
>>       1.0 = tf(termFreq(searchText:takeaway)=1)
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.5 = fieldNorm(field=searchText, doc=1)
>>   0.34325486 = (MATCH) sum of:
>>     0.17162743 = (MATCH) weight(searchText:fred in 1), product of:
>>       0.57735026 = queryWeight(searchText:fred), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of:
>>         1.0 = tf(termFreq(searchText:fred)=1)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.5 = fieldNorm(field=searchText, doc=1)
>>     0.17162743 = (MATCH) weight(searchText:company.com in 1), product of:
>>       0.57735026 = queryWeight(searchText:company.com), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1),
>> product of:
>>         1.0 = tf(termFreq(searchText:company.com)=1)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.5 = fieldNorm(field=searchText, doc=1)
>>
>>
>> 2. ABC Takeaway fred@company.com fred@company.com
>> 0.49279732 = (MATCH) sum of:
>>   0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of:
>>     0.57735026 = queryWeight(searchText:takeaway), product of:
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.97109574 = queryNorm
>>     0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of:
>>       1.0 = tf(termFreq(searchText:takeaway)=1)
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.375 = fieldNorm(field=searchText, doc=0)
>>   0.36407676 = (MATCH) sum of:
>>     0.18203838 = (MATCH) weight(searchText:fred in 0), product of:
>>       0.57735026 = queryWeight(searchText:fred), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of:
>>         1.4142135 = tf(termFreq(searchText:fred)=2)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.375 = fieldNorm(field=searchText, doc=0)
>>     0.18203838 = (MATCH) weight(searchText:company.com in 0), product of:
>>       0.57735026 = queryWeight(searchText:company.com), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0),
>> product of:
>>         1.4142135 = tf(termFreq(searchText:company.com)=2)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.375 = fieldNorm(field=searchText, doc=0)
>>
>>
>> On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath <
>> meeraj.kunnumpurath@asyska.com> wrote:
>>
>>> The actual query is
>>>
>>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>> analyzer).parse("Takeaway fred@company.com");
>>>
>>> If I use
>>>
>>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>> analyzer).parse("fred@company.com");
>>>
>>> I get them in the reverse order.
>>>
>>> Regards
>>> Meeraj
>>>
>>>
>>> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath <
>>> meeraj.kunnumpurath@asyska.com> wrote:
>>>
>>>> I have tried the same using Lucene directly with the following code,
>>>>
>>>> import org.apache.lucene.store.RAMDirectory;
>>>> import org.apache.lucene.document.Document;
>>>> import org.apache.lucene.document.Field;
>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>> import org.apache.lucene.util.Version;
>>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>>> import org.apache.lucene.index.IndexWriter;
>>>> import org.apache.lucene.queryParser.QueryParser;
>>>> import org.apache.lucene.index.IndexReader;
>>>> import org.apache.lucene.search.IndexSearcher;
>>>> import org.apache.lucene.search.Query;
>>>> import org.apache.lucene.search.TopScoreDocCollector;
>>>> import org.apache.lucene.search.ScoreDoc;
>>>>
>>>> public class LuceneTest {
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>
>>>>         StandardAnalyzer analyzer = new
>>>> StandardAnalyzer(Version.LUCENE_35);
>>>>         RAMDirectory index = new RAMDirectory();
>>>>         IndexWriterConfig config = new
>>>> IndexWriterConfig(Version.LUCENE_35,
>>>>                 analyzer);
>>>>         IndexWriter indexWriter = new IndexWriter(index, config);
>>>>
>>>>         Document doc1 = new Document();
>>>>         doc1.add(new Field("searchText", "ABC Takeaway fred@company.com
>>>> fred@company.com", Field.Store.YES, Field.Index.ANALYZED));
>>>>         Document doc2 = new Document();
>>>>         doc2.add(new Field("searchText", "XYZ Takeaway fred@company.com",
>>>> Field.Store.YES, Field.Index.ANALYZED));
>>>>
>>>>         indexWriter.addDocument(doc1);
>>>>         indexWriter.addDocument(doc2);
>>>>         indexWriter.close();
>>>>
>>>>         Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>>> analyzer).parse("Takeaway");
>>>>
>>>>         int hitsPerPage = 10;
>>>>         IndexReader reader = IndexReader.open(index);
>>>>         IndexSearcher searcher = new IndexSearcher(reader);
>>>>         TopScoreDocCollector collector =
>>>> TopScoreDocCollector.create(hitsPerPage, true);
>>>>         searcher.search(q, collector);
>>>>         ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>>>
>>>>         System.out.println("Found " + hits.length + " hits.");
>>>>         for(int i=0;i<hits.length;++i) {
>>>>             int docId = hits[i].doc;
>>>>             Document d = searcher.doc(docId);
>>>>             System.out.println((i + 1) + ". " + d.get("searchText"));
>>>>         }
>>>>
>>>>     }
>>>>
>>>> }
>>>>
>>>> The output is ..
>>>>
>>>> Found 2 hits.
>>>> 1. XYZ Takeaway fred@company.com
>>>> 2. ABC Takeaway fred@company.com fred@company.com
>>>>
>>>>
>>>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
>>>> meeraj.kunnumpurath@asyska.com> wrote:
>>>>
>>>>> Thanks Ivan.
>>>>>
>>>>> I don't use Lucene directly, it is used behind the scene by the Neo4J
>>>>> graph database for full-text indexing. According to their documentation
for
>>>>> full text indexes they use white space tokenizer in the analyser. Yes,
I do
>>>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from
the
>>>>> search string, and just put "fred@company.com", I get Listing 1 first.
>>>>>
>>>>> Regards
>>>>> Meeraj
>>>>>
>>>>>
>>>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan@brusic.com>
wrote:
>>>>>
>>>>>> Use the explain function to understand why the query is producing
the
>>>>>> results you see.
>>>>>>
>>>>>>
>>>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>>>>>> ,
>>>>>> int)
>>>>>>
>>>>>> Does your current query return Listing 2 first? That might be because
>>>>>> of term frequencies. Which analyzers are you using?
>>>>>>
>>>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>>>>>> <meeraj.kunnumpurath@asyska.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I am quite new to Lucene. I am trying to use it to index listings
of
>>>>>> local
>>>>>> > businesses. The index has only one field, that stores the attributes
>>>>>> of a
>>>>>> > listing as well as email addresses of users who have rated that
>>>>>> business.
>>>>>> >
>>>>>> > For example,
>>>>>> >
>>>>>> > Listing 1: "XYZ Takeaway London fred@company.com barney@company.com
>>>>>> > fred@company.com"
>>>>>> > Listing 2: "ABC Takeaway London fred@company.com barney@company.com"
>>>>>> >
>>>>>> > Now when the user does a search with "Takeaway fred@company.com",
>>>>>> how do I
>>>>>> > get listing 1 to always come before listing 2, because it has
the
>>>>>> term
>>>>>> > fred@company.com appear twice where as listing 2 has it only
once?
>>>>>> >
>>>>>> > Regards
>>>>>> > Meeraj
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message