Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 86513 invoked from network); 11 Mar 2005 03:26:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 11 Mar 2005 03:26:58 -0000 Received: (qmail 88821 invoked by uid 500); 11 Mar 2005 03:26:56 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 88790 invoked by uid 500); 11 Mar 2005 03:26:56 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 88776 invoked by uid 99); 11 Mar 2005 03:26:56 -0000 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of clamprecht@gmail.com designates 64.233.170.204 as permitted sender) Received: from rproxy.gmail.com (HELO rproxy.gmail.com) (64.233.170.204) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 10 Mar 2005 19:26:54 -0800 Received: by rproxy.gmail.com with SMTP id i8so746336rne for ; Thu, 10 Mar 2005 19:26:53 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=Q7hn3+ZpmYEgpdSz0/0j6tSUuW2cfkSLrdMPgfRmjJ4Q+t66vt1N9CV+9Q1bVgMLulDv2s4rQwhzxkOPuRtxb1Gzn9iplq8lu7bHUvUh8YvRcmNHtVov0Gt/vQqeuV5ryNryfJtXy4YWAvwQHJZNetIy5tQ3QmWrn80or67O6vE= Received: by 10.38.78.51 with SMTP id a51mr769132rnb; Thu, 10 Mar 2005 19:26:52 -0800 (PST) Received: by 10.38.104.11 with HTTP; Thu, 10 Mar 2005 19:26:52 -0800 (PST) Message-ID: <88c6a672050310192653cd56fc@mail.gmail.com> Date: Thu, 10 Mar 2005 21:26:52 -0600 From: Chris Lamprecht Reply-To: Chris Lamprecht To: java-dev@lucene.apache.org Subject: Re: Proposed Lucene modification - FieldCollector In-Reply-To: <20050310075859.40927.qmail@web26003.mail.ukl.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <20050310075859.40927.qmail@web26003.mail.ukl.yahoo.com> X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Mark, This is a good idea I hadn't though of. But I don't think it will work in my case, since I need the actual whole field values (i.e. "Sun Microsystems", not just the tokens, [sun] or [microsystems]). It might work if the fields happen to be indexed as keywords, but in my case they are not. -chris On Thu, 10 Mar 2005 07:58:59 +0000 (GMT), mark harwood wrote: > >>To get complete statistics like > >>above, you currently have to iterate through the > result > >> set and pull each Document from the Hits. > > Not necessarily true. You can use TermVectors or an > indexed field eg "doctype" to derive this stuff > without stored fields. Here's an example of how I've > done it before using indexed fields. I've been meaning > to tidy this up and contribute this as it looks like > it could be generally useful. The "GroupKeyFactory" is > an abstraction which allows you to process a term > before using it for totalling eg to group dates on a > year rather than a full date. > > protected GroupTotal[] > groupByIndexTokens(GroupQueryParams params)throws > ParseException, IOException > { > final HashMap totals = new HashMap(); > final GroupingKeyFactory groupKeyFactory = > params.getGroupKeyFactory(); > String groupFieldName = > params.getGroupFieldName(); > //TODO IndexSearcher should be passed in and > resused? > IndexSearcher searcher = new > IndexSearcher(reader); > float minScore = params.getMinDocScore(); > final float scores[] = new > float[reader.numDocs()]; > String queryString=params.getQuery(); > > if((queryString==null)||(queryString.trim().length()==0)) > { > //TODO if query is null then we could > optimise counting by just taking docFreq > // from TermEnum and avoding use of > TermDocs? > Arrays.fill(scores,1); > } > else > { > Query query = null; > query = QueryParser.parse(params.getQuery(), > "contents", analyzer); > searcher.search(query, null, new > HitCollector() > { > public void collect(int docID, float > score) > { > scores[docID] = score; > } > }); > } > > TermEnum te = reader.terms(new > Term(groupFieldName, "")); > Term term = te.term(); > while (term!=null) > { > if (term.field().equals(groupFieldName)) > { > TermDocs termDocs = > reader.termDocs(term); > GroupTotal groupTotal = null; > > boolean continueThisTerm = true; > while ((continueThisTerm) && > (termDocs.next())) > { > int docID = termDocs.doc(); > float docScore = scores[docID]; > //TODO include logic to test > queryParams.includeZeroScore groups > if ((docScore > 0) && (docScore > > minScore)) > // > if(docScore>minScore) > { > if (groupTotal == null) > { > //look up the group key > and initialize > String termText = > term.text(); > Object key = termText; > if (groupKeyFactory != > null) > { > key = > groupKeyFactory.getGroupingKey(termText,docID); > if (key == null) > { > continueThisTerm = > false; > continue; > } > } > groupTotal = (GroupTotal) > totals.get(key); > if (groupTotal == null) > { > //no totals exist yet, > create new one. > groupTotal = new > GroupTotal(params > > .getReturnDocIdsWithGroups()); > > groupTotal.setGroupKey(key); > totals.put(key, > groupTotal); > > groupTotal.addToTotalDocFreq(te.docFreq()); > } > } > > groupTotal.addQueryMatchDoc(docID, scores[docID]); > } > } > } else > { > break; > } > if(te.next()) > { > term=te.term(); > } > else > { > break; > } > } > Collection result = totals.values(); > GroupTotal[] results = (GroupTotal[]) > result.toArray(new GroupTotal[result.size()]); > return results; > } > > Send instant messages to your online friends http://uk.messenger.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org