Return-Path: Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 57873 invoked from network); 7 Sep 2003 02:44:39 -0000 Received: from unknown (HELO c000.snv.cp.net) (209.228.32.72) by daedalus.apache.org with SMTP; 7 Sep 2003 02:44:39 -0000 Received: (cpmta 3675 invoked from network); 6 Sep 2003 19:44:44 -0700 Received: from 68.170.78.210 (HELO ehatchersolutions.com) by smtp.hatcher.net (209.228.32.72) with SMTP; 6 Sep 2003 19:44:44 -0700 X-Sent: 7 Sep 2003 02:44:44 GMT Date: Sat, 6 Sep 2003 22:44:45 -0400 Subject: Re: Lucene features Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v552) From: Erik Hatcher To: "Lucene Users List" Content-Transfer-Encoding: 7bit In-Reply-To: <3F592029.30301@seznam.cz> Message-Id: <3DC24860-E0DD-11D7-ADF3-000393A564E6@ehatchersolutions.com> X-Mailer: Apple Mail (2.552) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Friday, September 5, 2003, at 07:45 PM, Leo Galambos wrote: >> And for the second time today.... QueryFilter. It allows narrowing >> the documents queried to only the documents from a previous Query. > > > I guess, it would not be an ideal solution - the first query does two > things a) it selects a subset from the corpus; b) it assigns a > relevance to each document of this subset. Your solution omits the > second point. It implies, the solution will not return "good hit > lists", because you will not consider the information value of the > first query which was given to you by a user. Yes, you're right. Getting the scores of a second query based on the scores of the first query is probably not trivial, but probably possible with Lucene. And that combined with a QueryFilter would do the trick I suspect. Somehow the scores of the first query could be remembered and used as a boost (or other type of factor) the scores of the second query. Am I off base here? > Thus I think, Chris would implement something more complex than > QueryFilter. If not, the results will be poorer than with the > commercial packages he may get. He could use a different model where > "AND" is not an associative operator (i.e. some modification of the > extended Boolean model). It implies, he would implement it in > Similarity.java (if I remember that class name correctly). Right... but you'd still need the filtering capability as well, I would think - at least for performance reasons. Erik