Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76540 invoked from network); 2 Aug 2009 16:49:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Aug 2009 16:49:22 -0000 Received: (qmail 94212 invoked by uid 500); 2 Aug 2009 16:49:25 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 94133 invoked by uid 500); 2 Aug 2009 16:49:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 94123 invoked by uid 99); 2 Aug 2009 16:49:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Aug 2009 16:49:25 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of phil123@gmail.com designates 209.85.210.173 as permitted sender) Received: from [209.85.210.173] (HELO mail-yx0-f173.google.com) (209.85.210.173) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Aug 2009 16:49:16 +0000 Received: by yxe3 with SMTP id 3so3591583yxe.29 for ; Sun, 02 Aug 2009 09:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=qnvhRWI2m7c2bU3qY1ytTVCTf4FWjz645tTxUxw8gLY=; b=TvxVq0BshBNYiKjrMdRm37EXb5VQPczz9RW2ryearCHjebRCMZa+tDUYfMUxbjaJMj NWdIDtgya+Df3KxiE1/QWYduMh3pgdbD0K436lUBv3qUK5gW3exTXWlRUBVBkjlBabr+ lfC9lwOayo8Ifcspr4K/2UyP7ip4/W6nCzd9U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=uhMbfLHf4XqqiOPC194U7ksQAvO+zlwAySL5GX8aKmeGJNmM8bLOeQUY/S84F7qkan 9obvP35bjqy8OsWV+0NpftBw2LOe/EDlnHC+C9lfPGXi4c4x5h7pwT0GrcsfXPwuQ9NM Offosobn61b+X9iGHdsgKd7UV1wKVazteeuS0= MIME-Version: 1.0 Received: by 10.100.239.4 with SMTP id m4mr28101anh.97.1249231735488; Sun, 02 Aug 2009 09:48:55 -0700 (PDT) In-Reply-To: <9cafbc680908020908u42e0f009l2e93995c8c17d15f@mail.gmail.com> References: <20090802043224.9IQHT.16549.imail@eastrmwml37> <9cafbc680908020908u42e0f009l2e93995c8c17d15f@mail.gmail.com> Date: Sun, 2 Aug 2009 09:48:55 -0700 Message-ID: <9cafbc680908020948s5dea6b63x1ce8ac1ff8cd0972@mail.gmail.com> Subject: Re: Weird discrepancy with term counts vs. terms (off by 1) From: Phil Whelan To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Jim, On Sun, Aug 2, 2009 at 9:08 AM, Phil Whelan wrote: > >> So then, I reviewed the index using Luke, and what I saw with that was that there were indeed only 12 "path" terms (under "Term Count" on the left), but, when I clicked the "Show Top Terms" in Luke, there were 13 terms listed by Luke. > > Yes, I just checked this and this seems to be a bug with Luke. It > always shows 1 less than in "Term Count" than it should. Well spotted. I was able to see why this way happening in the Luke source and I've submitted the following patch to Andrzej, the author of Luke. Thanks, Phil --- luke.orig/src/org/getopt/luke/Luke.java 2009-03-19 22:41:34.000000000 -0700 +++ luke-src-0.9.2/src/org/getopt/luke/Luke.java 2009-08-02 09:33:24.000000000 -0700 @@ -813,23 +813,18 @@ setString(iFields, "text", String.valueOf(idxFields.length)); Object iTerms = find(pOver, "iTerms"); termCounts.clear(); - FieldTermCount ftc = new FieldTermCount(); + FieldTermCount ftc = null; TermEnum te = ir.terms(); numTerms = 0; while (te.next()) { Term currTerm = te.term(); - if (ftc.fieldname == null) { + if (ftc == null || ftc.fieldname == null || ftc.fieldname != currTerm.field()) { // initialize - ftc.fieldname = currTerm.field(); - termCounts.put(ftc.fieldname, ftc); - } - if (ftc.fieldname == currTerm.field()) { - ftc.termCount++; - } else { ftc = new FieldTermCount(); ftc.fieldname = currTerm.field(); termCounts.put(ftc.fieldname, ftc); } + ftc.termCount++; numTerms++; } te.close(); --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org