lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meeraj Kunnumpurath <meeraj.kunnumpur...@asyska.com>
Subject Re: Search Ranking
Date Wed, 16 May 2012 20:52:24 GMT
This is the output I get from explaining the plan ..

Found 2 hits.
1. XYZ Takeaway fred@company.com
0.5148823 = (MATCH) sum of:
  0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of:
    0.57735026 = queryWeight(searchText:takeaway), product of:
      0.5945349 = idf(docFreq=2, maxDocs=2)
      0.97109574 = queryNorm
    0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of:
      1.0 = tf(termFreq(searchText:takeaway)=1)
      0.5945349 = idf(docFreq=2, maxDocs=2)
      0.5 = fieldNorm(field=searchText, doc=1)
  0.34325486 = (MATCH) sum of:
    0.17162743 = (MATCH) weight(searchText:fred in 1), product of:
      0.57735026 = queryWeight(searchText:fred), product of:
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.97109574 = queryNorm
      0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of:
        1.0 = tf(termFreq(searchText:fred)=1)
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.5 = fieldNorm(field=searchText, doc=1)
    0.17162743 = (MATCH) weight(searchText:company.com in 1), product of:
      0.57735026 = queryWeight(searchText:company.com), product of:
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.97109574 = queryNorm
      0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1),
product of:
        1.0 = tf(termFreq(searchText:company.com)=1)
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.5 = fieldNorm(field=searchText, doc=1)

2. ABC Takeaway fred@company.com fred@company.com
0.49279732 = (MATCH) sum of:
  0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of:
    0.57735026 = queryWeight(searchText:takeaway), product of:
      0.5945349 = idf(docFreq=2, maxDocs=2)
      0.97109574 = queryNorm
    0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of:
      1.0 = tf(termFreq(searchText:takeaway)=1)
      0.5945349 = idf(docFreq=2, maxDocs=2)
      0.375 = fieldNorm(field=searchText, doc=0)
  0.36407676 = (MATCH) sum of:
    0.18203838 = (MATCH) weight(searchText:fred in 0), product of:
      0.57735026 = queryWeight(searchText:fred), product of:
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.97109574 = queryNorm
      0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of:
        1.4142135 = tf(termFreq(searchText:fred)=2)
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.375 = fieldNorm(field=searchText, doc=0)
    0.18203838 = (MATCH) weight(searchText:company.com in 0), product of:
      0.57735026 = queryWeight(searchText:company.com), product of:
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.97109574 = queryNorm
      0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0),
product of:
        1.4142135 = tf(termFreq(searchText:company.com)=2)
        0.5945349 = idf(docFreq=2, maxDocs=2)
        0.375 = fieldNorm(field=searchText, doc=0)

On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath <
meeraj.kunnumpurath@asyska.com> wrote:

> The actual query is
>
> Query q = new QueryParser(Version.LUCENE_35, "searchText",
> analyzer).parse("Takeaway fred@company.com");
>
> If I use
>
> Query q = new QueryParser(Version.LUCENE_35, "searchText",
> analyzer).parse("fred@company.com");
>
> I get them in the reverse order.
>
> Regards
> Meeraj
>
>
> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath <
> meeraj.kunnumpurath@asyska.com> wrote:
>
>> I have tried the same using Lucene directly with the following code,
>>
>> import org.apache.lucene.store.RAMDirectory;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.index.IndexWriterConfig;
>> import org.apache.lucene.util.Version;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.index.IndexReader;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>> import org.apache.lucene.search.TopScoreDocCollector;
>> import org.apache.lucene.search.ScoreDoc;
>>
>> public class LuceneTest {
>>
>>     public static void main(String[] args) throws Exception {
>>
>>         StandardAnalyzer analyzer = new
>> StandardAnalyzer(Version.LUCENE_35);
>>         RAMDirectory index = new RAMDirectory();
>>         IndexWriterConfig config = new
>> IndexWriterConfig(Version.LUCENE_35,
>>                 analyzer);
>>         IndexWriter indexWriter = new IndexWriter(index, config);
>>
>>         Document doc1 = new Document();
>>         doc1.add(new Field("searchText", "ABC Takeaway fred@company.com
>> fred@company.com", Field.Store.YES, Field.Index.ANALYZED));
>>         Document doc2 = new Document();
>>         doc2.add(new Field("searchText", "XYZ Takeaway fred@company.com",
>> Field.Store.YES, Field.Index.ANALYZED));
>>
>>         indexWriter.addDocument(doc1);
>>         indexWriter.addDocument(doc2);
>>         indexWriter.close();
>>
>>         Query q = new QueryParser(Version.LUCENE_35, "searchText",
>> analyzer).parse("Takeaway");
>>
>>         int hitsPerPage = 10;
>>         IndexReader reader = IndexReader.open(index);
>>         IndexSearcher searcher = new IndexSearcher(reader);
>>         TopScoreDocCollector collector =
>> TopScoreDocCollector.create(hitsPerPage, true);
>>         searcher.search(q, collector);
>>         ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>
>>         System.out.println("Found " + hits.length + " hits.");
>>         for(int i=0;i<hits.length;++i) {
>>             int docId = hits[i].doc;
>>             Document d = searcher.doc(docId);
>>             System.out.println((i + 1) + ". " + d.get("searchText"));
>>         }
>>
>>     }
>>
>> }
>>
>> The output is ..
>>
>> Found 2 hits.
>> 1. XYZ Takeaway fred@company.com
>> 2. ABC Takeaway fred@company.com fred@company.com
>>
>>
>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
>> meeraj.kunnumpurath@asyska.com> wrote:
>>
>>> Thanks Ivan.
>>>
>>> I don't use Lucene directly, it is used behind the scene by the Neo4J
>>> graph database for full-text indexing. According to their documentation for
>>> full text indexes they use white space tokenizer in the analyser. Yes, I do
>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
>>> search string, and just put "fred@company.com", I get Listing 1 first.
>>>
>>> Regards
>>> Meeraj
>>>
>>>
>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan@brusic.com> wrote:
>>>
>>>> Use the explain function to understand why the query is producing the
>>>> results you see.
>>>>
>>>>
>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>>>> ,
>>>> int)
>>>>
>>>> Does your current query return Listing 2 first? That might be because
>>>> of term frequencies. Which analyzers are you using?
>>>>
>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>>>
>>>> Cheers,
>>>>
>>>> Ivan
>>>>
>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>>>> <meeraj.kunnumpurath@asyska.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I am quite new to Lucene. I am trying to use it to index listings of
>>>> local
>>>> > businesses. The index has only one field, that stores the attributes
>>>> of a
>>>> > listing as well as email addresses of users who have rated that
>>>> business.
>>>> >
>>>> > For example,
>>>> >
>>>> > Listing 1: "XYZ Takeaway London fred@company.com barney@company.com
>>>> > fred@company.com"
>>>> > Listing 2: "ABC Takeaway London fred@company.com barney@company.com"
>>>> >
>>>> > Now when the user does a search with "Takeaway fred@company.com",
>>>> how do I
>>>> > get listing 1 to always come before listing 2, because it has the term
>>>> > fred@company.com appear twice where as listing 2 has it only once?
>>>> >
>>>> > Regards
>>>> > Meeraj
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message