lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chong, Herb" <>
Subject RE: Query reformulation (Relevance Feedback) in Lucene?
Date Wed, 03 Dec 2003 13:40:21 GMT
there is no direct support in Lucene for this. there are several strategies for automatic query
expansion and most of them rely on either extensive domain-specific analysis of the top N
documents on the assumption that the search engine performs well enough to guarantee that
the top N documents are all relevant, or that there is a special domain-specific corpus of
"good documents" where the initial search is against these picked documents and their terms
mined to augment the initial query before resubmitting to the original corpus. all of these
things are things you have to do yourself.  term reweighting happens by using term boost.
how much you boost by is an open question.


-----Original Message-----
From: []
Sent: Wednesday, December 03, 2003 6:55 AM
Subject: Query reformulation (Relevance Feedback) in Lucene?

Hello Group of Lucene users,

query reformulation is understood as a effective way to improve retrieval
power significantly. The theory teaches us that it consists of two basic steps:

a) Query expansion (with new terms)
b) Reweighting of the terms in the expanded query

User relevance feedback is the most popular reformulation strategy to
perform query reformulation because it is user centered. 

Does Lucene generally support this approach? Especially I am wondering if

1) there are classes which directly support query expansion OR
2) I would need to do some programming on top of more generic parts? 

I do not know about 1). All I know about 2) is what I think could work with
no evidence if it actually does :-) I think Query expansion with new terms is
easy and would just need to create a new QueryParser object with existing
terms plus the top n (most frequent terms) of the (in the user point of view)
relevant documents. Then I would have a extended query (a). However I do not
know how can I reweight this terms? When I formulate the Query I do not
actually know about there weights since it is done internally. Does anybody have
any idea? Did anybody try to solve this and has some examples which he/she
would like to provide?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message