Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 14695 invoked from network); 16 Oct 2010 09:51:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Oct 2010 09:51:58 -0000 Received: (qmail 69133 invoked by uid 500); 16 Oct 2010 09:51:56 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68954 invoked by uid 500); 16 Oct 2010 09:51:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68946 invoked by uid 99); 16 Oct 2010 09:51:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Oct 2010 09:51:52 +0000 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [85.25.71.29] (HELO mail.troja.net) (85.25.71.29) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Oct 2010 09:51:45 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.troja.net (Postfix) with ESMTP id 63F5D45F76B for ; Sat, 16 Oct 2010 11:51:24 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.troja.net Received: from mail.troja.net ([127.0.0.1]) by localhost (megaira.troja.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id duHPFwNwxXg8 for ; Sat, 16 Oct 2010 11:51:12 +0200 (CEST) Received: from VEGA (port-83-236-62-54.dynamic.qsc.de [83.236.62.54]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.troja.net (Postfix) with ESMTPSA id 0318045F767 for ; Sat, 16 Oct 2010 11:51:11 +0200 (CEST) From: "Uwe Schindler" To: References: <1287186677922-1712290.post@n3.nabble.com> In-Reply-To: Subject: RE: API that return the amount of terms indexed Date: Sat, 16 Oct 2010 11:52:01 +0200 Message-ID: <003e01cb6d17$c813d750$583b85f0$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQGeLOP7ZnnJXdTuw2JGbPo6atlVOQIQeWGBk41RTwA= Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org Hi Mike, As far as I know, 3.0 also has this method: http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/Inde= xRe ader.html#getUniqueTermCount() But it also only works on segment level, too! So you have to use getSequentialSubReaders/ReaderUtil.gatherSubReaders() and do it per = segment. But to get the unique count for the whole index, there is no way around iterating every term, as duplicates must be removed (which TermEnum = does). Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Michael McCandless [mailto:lucene@mikemccandless.com] > Sent: Saturday, October 16, 2010 11:17 AM > To: java-user@lucene.apache.org > Subject: Re: API that return the amount of terms indexed >=20 > 4.0 will have an API to get the number of unique terms for a given = field, or > across all fields, but only at the segment level. (Getting the count across > segments requires a merge sort). >=20 > 3.x and before doesn't have such an API, though the information is = tracked > under the hood. If you open the _X.tis file, skip the first int, then call > readLong(), that should be the number of unique terms in that segment. >=20 > You can always simply fallback to getting the term enum and stepping counting > how many .next()'s there are until exhaustion... >=20 > Mike >=20 > On Fri, Oct 15, 2010 at 7:51 PM, APOLO_11 = wrote: > > > > hey - is there an API that return the number of term indexed? > > > > I found =A0the API return the amount of document indexed > > (IndexWriter.docCount) but cant find an API for the amount of terms = in > > the index. > > > > any idea ? > > > > thanks,d. > > -- > > View this message in context: > > = http://lucene.472066.n3.nabble.com/API-that-return-the-amount-of-terms > > -indexed-tp1712290p1712290.html Sent from the Lucene - Java Users > > mailing list archive at Nabble.com. > > > > = --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org