Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 51941 invoked from network); 4 Aug 2004 18:48:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 4 Aug 2004 18:48:31 -0000 Received: (qmail 70756 invoked by uid 500); 4 Aug 2004 18:34:55 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 70669 invoked by uid 500); 4 Aug 2004 18:34:54 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 70558 invoked by uid 99); 4 Aug 2004 18:34:52 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received: from [130.125.1.52] (HELO columba.unine.ch) (130.125.1.52) by apache.org (qpsmtpd/0.27.1) with ESMTP; Wed, 04 Aug 2004 11:34:49 -0700 Received: from mailb.unine.ch [130.125.1.55] by columba.unine.ch with XWall v3.30e ; Wed, 4 Aug 2004 20:34:46 +0200 Received: from mail1.UNINE.CH ([130.125.5.71]) by mailb.UNINE.CH with Microsoft SMTPSVC(6.0.3790.0); Wed, 4 Aug 2004 20:34:45 +0200 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5.6944.0 Subject: RE : Term Collection Frequency? Date: Wed, 4 Aug 2004 20:34:44 +0200 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Term Collection Frequency? Thread-Index: AcR6HRdwu/0FH4RmTDGd0LTkL/IC3wAAB6sgAAxlTfA= From: "ABDOU Samir" To: "Lucene Developers List" X-OriginalArrivalTime: 04 Aug 2004 18:34:46.0167 (UTC) FILETIME=[B76F2270:01C47A51] X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Thanks, >> What about the frequency of any given term in the whole collection!? >IndexReader.docFreq(Term t) this method doesn't give us the collection frequency of the given term t, but the number of documents in which this term appears.=20 Here an example of what I want: ------------------------------- We have this table for a term T Doc ID : 0, 1, 2, 3, 4 Frequency : 3, 5, 4, 2, 5 =20 In which this term appears 3 times in the document 0, 5 times in the document 1... and so on ! So the collection frequency of this term would be 3+5+4+2+5 =3D 19 N.B. : calculate this for each term at runtime will be very expensive! Is it possible to calculate and store this information during indexing?=20 ------------------------------- Regards, Samir --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org