Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 73255 invoked from network); 4 Oct 2003 19:48:45 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 4 Oct 2003 19:48:45 -0000 Received: (qmail 41239 invoked by uid 500); 4 Oct 2003 19:48:32 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 41208 invoked by uid 500); 4 Oct 2003 19:48:32 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 41195 invoked from network); 4 Oct 2003 19:48:32 -0000 Received: from unknown (HELO lycos.com) (209.202.220.150) by daedalus.apache.org with SMTP; 4 Oct 2003 19:48:32 -0000 Received: from Unknown/Local ([?.?.?.?]) by mailcity.com; Sat, 04 Oct 2003 19:48:29 -0000 To: "Lucene Developers List" Date: Sat, 04 Oct 2003 12:48:29 -0700 From: "none none" Message-ID: Mime-Version: 1.0 X-Sent-Mail: off Reply-To: korfut@lycos.com X-Mailer: MailCity Service X-Priority: 3 Subject: Re: Query Term Collector (was: Re: New highlighter package available) X-Sender-Ip: 64.187.36.2 Organization: Lycos Mail (http://www.mail.lycos.com:80) Content-Type: text/plain; charset=us-ascii Content-Language: en Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Mark, I looked at your code quickly, can you confirm that the following scenario is what happens when you run a search with MultiTermQuery? -construct your query manually or using QueryParser -run a search using indexsearcher -searcher will collect all the terms using "rewrite(IndexReader reader)" -for each document you have need to get the terms (highlight usually) you will call: --- getTerms(Query query, HashSet terms, boolean prohibited), and because it is a MultiTermQuery, i see you need to call againg "reader.rewrite()". IF my assumption is correct seems to me there are some resources wasted because the method rewrite has been called already by the searcher. That's why i added getTerms() and a few ArrayList to hold them in the current instance of the query. In my case each user-search creates a new Query so those array will be released at the end. Of course Mark there is space for improvements, i agree with you about found a "home" for getTerms(), and actually the home is there ! but we don't have the keys! private Vector clauses = new Vector(); holds almost the same values my arraylist does, the difference is, prohibited Clause are there as well. Also, i had to make it working as fast as possible and as good as possible in a short time, now that you opened my mind i believe the method getTerms could get them from the clauses vector inside BooleanQuery. May be a boolean prohibited could be passed as parameter to "skip" these clauses (would save some work to highlighters). I still believe that Query should have an abstract method getTerms(..) otherwise we should switch case between different query type to get them, a common way it alsways better, my opinion. Thank you, Ciao Korfut. -- --------- Original Message --------- DATE: Sat, 4 Oct 2003 09:47:05 From: markharw00d@yahoo.co.uk To: lucene-dev@jakarta.apache.org Cc: >With regards to Korfut's TermCollector proposition: >I do not like the new requirement for all query classes to implement getTerms(). This is effectively what they are currently >required to do in the query.rewrite() method - express their high-level logic in primitive terms. > >I beleive the getTerms() implementation should make use of this existing feature of all query objects (as I have done in >QueryHighlightExtractor.java), and not create a new set of requirements for all query classes - lets not add complexity where its >not needed. >So, I think the real question is should there be a home for a getTerms() function that operates on primitive (rewritten) queries? > >We can move some of the logic in QueryHighlightExtractor.java to somewhere core if the consensus is that >this is a generally useful feature (though I have yet to think of one outside of highlighting) > >Incidentally, it may be of interest to note that I am busy packaging up a getTopTerms() feature that analyses the contents >of query result sets and returns the "significant" terms and phrases found in the result set based on their relative frequency >compared to that of the corpus. >Its quite effective and of use in query expansion and highlighting. >This may be of interest to those proposing query.getTerms() changes. > >Cheers >Mark > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > ____________________________________________________________ Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail! http://login.mail.lycos.com/r/referral?aid=27005 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org