lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meeraj Kunnumpurath <meeraj.kunnumpur...@asyska.com>
Subject Re: Search Ranking
Date Wed, 16 May 2012 20:54:56 GMT
Also, if I do the below

Query q = new QueryParser(Version.LUCENE_35, "searchText",
analyzer).parse("Takeaway fred@company.com^100")

I get them in reverse order. Do I need to boost the term, even if it
appears more than once in the document?

Regards
Meeraj

On Wed, May 16, 2012 at 9:52 PM, Meeraj Kunnumpurath <
meeraj.kunnumpurath@asyska.com> wrote:

> This is the output I get from explaining the plan ..
>
>
> Found 2 hits.
> 1. XYZ Takeaway fred@company.com
> 0.5148823 = (MATCH) sum of:
>   0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of:
>     0.57735026 = queryWeight(searchText:takeaway), product of:
>       0.5945349 = idf(docFreq=2, maxDocs=2)
>       0.97109574 = queryNorm
>     0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of:
>       1.0 = tf(termFreq(searchText:takeaway)=1)
>       0.5945349 = idf(docFreq=2, maxDocs=2)
>       0.5 = fieldNorm(field=searchText, doc=1)
>   0.34325486 = (MATCH) sum of:
>     0.17162743 = (MATCH) weight(searchText:fred in 1), product of:
>       0.57735026 = queryWeight(searchText:fred), product of:
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.97109574 = queryNorm
>       0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of:
>         1.0 = tf(termFreq(searchText:fred)=1)
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.5 = fieldNorm(field=searchText, doc=1)
>     0.17162743 = (MATCH) weight(searchText:company.com in 1), product of:
>       0.57735026 = queryWeight(searchText:company.com), product of:
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.97109574 = queryNorm
>       0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1),
> product of:
>         1.0 = tf(termFreq(searchText:company.com)=1)
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.5 = fieldNorm(field=searchText, doc=1)
>
>
> 2. ABC Takeaway fred@company.com fred@company.com
> 0.49279732 = (MATCH) sum of:
>   0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of:
>     0.57735026 = queryWeight(searchText:takeaway), product of:
>       0.5945349 = idf(docFreq=2, maxDocs=2)
>       0.97109574 = queryNorm
>     0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of:
>       1.0 = tf(termFreq(searchText:takeaway)=1)
>       0.5945349 = idf(docFreq=2, maxDocs=2)
>       0.375 = fieldNorm(field=searchText, doc=0)
>   0.36407676 = (MATCH) sum of:
>     0.18203838 = (MATCH) weight(searchText:fred in 0), product of:
>       0.57735026 = queryWeight(searchText:fred), product of:
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.97109574 = queryNorm
>       0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of:
>         1.4142135 = tf(termFreq(searchText:fred)=2)
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.375 = fieldNorm(field=searchText, doc=0)
>     0.18203838 = (MATCH) weight(searchText:company.com in 0), product of:
>       0.57735026 = queryWeight(searchText:company.com), product of:
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.97109574 = queryNorm
>       0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0),
> product of:
>         1.4142135 = tf(termFreq(searchText:company.com)=2)
>         0.5945349 = idf(docFreq=2, maxDocs=2)
>         0.375 = fieldNorm(field=searchText, doc=0)
>
>
> On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath <
> meeraj.kunnumpurath@asyska.com> wrote:
>
>> The actual query is
>>
>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>> analyzer).parse("Takeaway fred@company.com");
>>
>> If I use
>>
>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>> analyzer).parse("fred@company.com");
>>
>> I get them in the reverse order.
>>
>> Regards
>> Meeraj
>>
>>
>> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath <
>> meeraj.kunnumpurath@asyska.com> wrote:
>>
>>> I have tried the same using Lucene directly with the following code,
>>>
>>> import org.apache.lucene.store.RAMDirectory;
>>> import org.apache.lucene.document.Document;
>>> import org.apache.lucene.document.Field;
>>> import org.apache.lucene.index.IndexWriterConfig;
>>> import org.apache.lucene.util.Version;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.index.IndexWriter;
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.index.IndexReader;
>>> import org.apache.lucene.search.IndexSearcher;
>>> import org.apache.lucene.search.Query;
>>> import org.apache.lucene.search.TopScoreDocCollector;
>>> import org.apache.lucene.search.ScoreDoc;
>>>
>>> public class LuceneTest {
>>>
>>>     public static void main(String[] args) throws Exception {
>>>
>>>         StandardAnalyzer analyzer = new
>>> StandardAnalyzer(Version.LUCENE_35);
>>>         RAMDirectory index = new RAMDirectory();
>>>         IndexWriterConfig config = new
>>> IndexWriterConfig(Version.LUCENE_35,
>>>                 analyzer);
>>>         IndexWriter indexWriter = new IndexWriter(index, config);
>>>
>>>         Document doc1 = new Document();
>>>         doc1.add(new Field("searchText", "ABC Takeaway fred@company.com
>>> fred@company.com", Field.Store.YES, Field.Index.ANALYZED));
>>>         Document doc2 = new Document();
>>>         doc2.add(new Field("searchText", "XYZ Takeaway fred@company.com",
>>> Field.Store.YES, Field.Index.ANALYZED));
>>>
>>>         indexWriter.addDocument(doc1);
>>>         indexWriter.addDocument(doc2);
>>>         indexWriter.close();
>>>
>>>         Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>> analyzer).parse("Takeaway");
>>>
>>>         int hitsPerPage = 10;
>>>         IndexReader reader = IndexReader.open(index);
>>>         IndexSearcher searcher = new IndexSearcher(reader);
>>>         TopScoreDocCollector collector =
>>> TopScoreDocCollector.create(hitsPerPage, true);
>>>         searcher.search(q, collector);
>>>         ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>>
>>>         System.out.println("Found " + hits.length + " hits.");
>>>         for(int i=0;i<hits.length;++i) {
>>>             int docId = hits[i].doc;
>>>             Document d = searcher.doc(docId);
>>>             System.out.println((i + 1) + ". " + d.get("searchText"));
>>>         }
>>>
>>>     }
>>>
>>> }
>>>
>>> The output is ..
>>>
>>> Found 2 hits.
>>> 1. XYZ Takeaway fred@company.com
>>> 2. ABC Takeaway fred@company.com fred@company.com
>>>
>>>
>>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
>>> meeraj.kunnumpurath@asyska.com> wrote:
>>>
>>>> Thanks Ivan.
>>>>
>>>> I don't use Lucene directly, it is used behind the scene by the Neo4J
>>>> graph database for full-text indexing. According to their documentation for
>>>> full text indexes they use white space tokenizer in the analyser. Yes, I
do
>>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
>>>> search string, and just put "fred@company.com", I get Listing 1 first.
>>>>
>>>> Regards
>>>> Meeraj
>>>>
>>>>
>>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan@brusic.com> wrote:
>>>>
>>>>> Use the explain function to understand why the query is producing the
>>>>> results you see.
>>>>>
>>>>>
>>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>>>>> ,
>>>>> int)
>>>>>
>>>>> Does your current query return Listing 2 first? That might be because
>>>>> of term frequencies. Which analyzers are you using?
>>>>>
>>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ivan
>>>>>
>>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>>>>> <meeraj.kunnumpurath@asyska.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am quite new to Lucene. I am trying to use it to index listings
of
>>>>> local
>>>>> > businesses. The index has only one field, that stores the attributes
>>>>> of a
>>>>> > listing as well as email addresses of users who have rated that
>>>>> business.
>>>>> >
>>>>> > For example,
>>>>> >
>>>>> > Listing 1: "XYZ Takeaway London fred@company.com barney@company.com
>>>>> > fred@company.com"
>>>>> > Listing 2: "ABC Takeaway London fred@company.com barney@company.com"
>>>>> >
>>>>> > Now when the user does a search with "Takeaway fred@company.com",
>>>>> how do I
>>>>> > get listing 1 to always come before listing 2, because it has the
>>>>> term
>>>>> > fred@company.com appear twice where as listing 2 has it only once?
>>>>> >
>>>>> > Regards
>>>>> > Meeraj
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message