Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 67686 invoked from network); 10 Mar 2005 07:59:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 10 Mar 2005 07:59:08 -0000 Received: (qmail 73385 invoked by uid 500); 10 Mar 2005 07:59:06 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 73345 invoked by uid 500); 10 Mar 2005 07:59:05 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 73329 invoked by uid 99); 10 Mar 2005 07:59:04 -0000 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from web26003.mail.ukl.yahoo.com (HELO web26003.mail.ukl.yahoo.com) (217.12.10.214) by apache.org (qpsmtpd/0.28) with SMTP; Wed, 09 Mar 2005 23:59:02 -0800 Received: (qmail 40929 invoked by uid 60001); 10 Mar 2005 07:58:59 -0000 Message-ID: <20050310075859.40927.qmail@web26003.mail.ukl.yahoo.com> Received: from [194.106.34.5] by web26003.mail.ukl.yahoo.com via HTTP; Thu, 10 Mar 2005 07:58:59 GMT Date: Thu, 10 Mar 2005 07:58:59 +0000 (GMT) From: mark harwood Subject: Re: Proposed Lucene modification - FieldCollector To: java-dev@lucene.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N >>To get complete statistics like >>above, you currently have to iterate through the result >> set and pull each Document from the Hits. Not necessarily true. You can use TermVectors or an indexed field eg "doctype" to derive this stuff without stored fields. Here's an example of how I've done it before using indexed fields. I've been meaning to tidy this up and contribute this as it looks like it could be generally useful. The "GroupKeyFactory" is an abstraction which allows you to process a term before using it for totalling eg to group dates on a year rather than a full date. protected GroupTotal[] groupByIndexTokens(GroupQueryParams params)throws ParseException, IOException { final HashMap totals = new HashMap(); final GroupingKeyFactory groupKeyFactory = params.getGroupKeyFactory(); String groupFieldName = params.getGroupFieldName(); //TODO IndexSearcher should be passed in and resused? IndexSearcher searcher = new IndexSearcher(reader); float minScore = params.getMinDocScore(); final float scores[] = new float[reader.numDocs()]; String queryString=params.getQuery(); if((queryString==null)||(queryString.trim().length()==0)) { //TODO if query is null then we could optimise counting by just taking docFreq // from TermEnum and avoding use of TermDocs? Arrays.fill(scores,1); } else { Query query = null; query = QueryParser.parse(params.getQuery(), "contents", analyzer); searcher.search(query, null, new HitCollector() { public void collect(int docID, float score) { scores[docID] = score; } }); } TermEnum te = reader.terms(new Term(groupFieldName, "")); Term term = te.term(); while (term!=null) { if (term.field().equals(groupFieldName)) { TermDocs termDocs = reader.termDocs(term); GroupTotal groupTotal = null; boolean continueThisTerm = true; while ((continueThisTerm) && (termDocs.next())) { int docID = termDocs.doc(); float docScore = scores[docID]; //TODO include logic to test queryParams.includeZeroScore groups if ((docScore > 0) && (docScore > minScore)) // if(docScore>minScore) { if (groupTotal == null) { //look up the group key and initialize String termText = term.text(); Object key = termText; if (groupKeyFactory != null) { key = groupKeyFactory.getGroupingKey(termText,docID); if (key == null) { continueThisTerm = false; continue; } } groupTotal = (GroupTotal) totals.get(key); if (groupTotal == null) { //no totals exist yet, create new one. groupTotal = new GroupTotal(params .getReturnDocIdsWithGroups()); groupTotal.setGroupKey(key); totals.put(key, groupTotal); groupTotal.addToTotalDocFreq(te.docFreq()); } } groupTotal.addQueryMatchDoc(docID, scores[docID]); } } } else { break; } if(te.next()) { term=te.term(); } else { break; } } Collection result = totals.values(); GroupTotal[] results = (GroupTotal[]) result.toArray(new GroupTotal[result.size()]); return results; } Send instant messages to your online friends http://uk.messenger.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org