From java-dev-return-9589-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Thu Mar 10 08:03:03 2005 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 72624 invoked from network); 10 Mar 2005 08:03:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 10 Mar 2005 08:03:03 -0000 Received: (qmail 86071 invoked by uid 500); 10 Mar 2005 08:02:59 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 86039 invoked by uid 500); 10 Mar 2005 08:02:59 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 86025 invoked by uid 500); 10 Mar 2005 08:02:59 -0000 Delivered-To: apmail-jakarta-lucene-dev@jakarta.apache.org Received: (qmail 86014 invoked by uid 99); 10 Mar 2005 08:02:59 -0000 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from web26010.mail.ukl.yahoo.com (HELO web26010.mail.ukl.yahoo.com) (217.12.10.221) by apache.org (qpsmtpd/0.28) with SMTP; Thu, 10 Mar 2005 00:02:58 -0800 Received: (qmail 60071 invoked by uid 60001); 10 Mar 2005 07:36:15 -0000 Message-ID: <20050310073615.60069.qmail@web26010.mail.ukl.yahoo.com> Received: from [194.106.34.5] by web26010.mail.ukl.yahoo.com via HTTP; Thu, 10 Mar 2005 07:36:15 GMT Date: Thu, 10 Mar 2005 07:36:15 +0000 (GMT) From: mark harwood Subject: Re: Proposed Lucene modification - FieldCollector To: lucenedev MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N >>To get complete statistics like >>above, you currently have to iterate through the result >> set and pull each Document from the Hits. Not necessarily true. You can use TermVectors or an indexed field eg "doctype" to derive this stuff without stored fields. Here's an example of how I've done it before using indexed fields. I've been meaning to tidy this up and contribute this as it looks like it could be generally useful. The "GroupKeyFactory" is an abstraction which allows you to process a term before using it for totalling eg to group dates on a year rather than a full date. protected GroupTotal[] groupByIndexTokens(GroupQueryParams params)throws ParseException, IOException { final HashMap totals = new HashMap(); final GroupingKeyFactory groupKeyFactory = params.getGroupKeyFactory(); String groupFieldName = params.getGroupFieldName(); //TODO IndexSearcher should be passed in and resused? IndexSearcher searcher = new IndexSearcher(reader); float minScore = params.getMinDocScore(); final float scores[] = new float[reader.numDocs()]; String queryString=params.getQuery(); if((queryString==null)||(queryString.trim().length()==0)) { //TODO if query is null then we could optimise counting by just taking docFreq // from TermEnum and avoding use of TermDocs? Arrays.fill(scores,1); } else { Query query = null; query = QueryParser.parse(params.getQuery(), "contents", analyzer); searcher.search(query, null, new HitCollector() { public void collect(int docID, float score) { scores[docID] = score; } }); } TermEnum te = reader.terms(new Term(groupFieldName, "")); Term term = te.term(); while (term!=null) { if (term.field().equals(groupFieldName)) { TermDocs termDocs = reader.termDocs(term); GroupTotal groupTotal = null; boolean continueThisTerm = true; while ((continueThisTerm) && (termDocs.next())) { int docID = termDocs.doc(); float docScore = scores[docID]; //TODO include logic to test queryParams.includeZeroScore groups if ((docScore > 0) && (docScore > minScore)) // if(docScore>minScore) { if (groupTotal == null) { //look up the group key and initialize String termText = term.text(); Object key = termText; if (groupKeyFactory != null) { key = groupKeyFactory.getGroupingKey(termText,docID); if (key == null) { continueThisTerm = false; continue; } } groupTotal = (GroupTotal) totals.get(key); if (groupTotal == null) { //no totals exist yet, create new one. groupTotal = new GroupTotal(params .getReturnDocIdsWithGroups()); groupTotal.setGroupKey(key); totals.put(key, groupTotal); groupTotal.addToTotalDocFreq(te.docFreq()); } } groupTotal.addQueryMatchDoc(docID, scores[docID]); } } } else { break; } if(te.next()) { term=te.term(); } else { break; } } Collection result = totals.values(); GroupTotal[] results = (GroupTotal[]) result.toArray(new GroupTotal[result.size()]); return results; } Send instant messages to your online friends http://uk.messenger.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org