lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lahiru Samarakoon <lahir...@gmail.com>
Subject Lucene Ranking Problem
Date Tue, 18 Jan 2011 12:12:15 GMT
Dear All,

 I have two documents. The analyzed and the tokenized contents are mentioned
below.

 *Document 1 :*

 *when*, null_1, *my*, null_1, money,

fund, amount, payment, creditcard, credit,

card, *bank, account*, debit, deduct,

*charge*, null_1, my, mobile, usage,

*service*, connection


 *Document 2:*

 *when*, what, time, what, day,

null_1, money, fund, cash, payment,

null_1, i, do, you, i,

null_1, deduct, *charge*, reduce, debit,

from, *my*, *bank, account*, credit,

card, null_1, *adsl*, adsl1, adsl-2,

adsl-1, adsl2, adsl, 1, adsl,

2, usage, connection, *service*


 Then, I searched for the following text.

 *Query:* when my bank account charge adsl service

 *Scores
*

Document 1 = 0.74406385

Document 2 = Score = 0.66067594

 I was expecting to have Document 2 as the top ranked document. But I get
Document 1 as the top ranked even it does not contains  the term “adsl”.

 The word order of the Document 1 matches with the query very well. Can it
be the reason ?

If it is, how can I neglect the word order when searching. (I am not using
phase queries).

My searching code look like below and it is very simple.


 *QueryParser parser = new QueryParser(Version.LUCENE_30, *

*"pattern", *

*new StandardAnalyzer(Version.LUCENE_30)); *

*org.apache.lucene.search.Query query1 =
parser.parse(this.query.getQuestion()); *

*TopDocs hits = is.search(query1, 10); *

 Please advice


Thanks,

Lahiru

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message