Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 3145 invoked from network); 30 May 2005 18:24:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 30 May 2005 18:24:09 -0000 Received: (qmail 1706 invoked by uid 500); 30 May 2005 18:24:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 977 invoked by uid 500); 30 May 2005 18:24:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 960 invoked by uid 99); 30 May 2005 18:24:02 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of mail.to.falko@gmx.de designates 213.165.64.20 as permitted sender) Received: from pop.gmx.net (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.28) with SMTP; Mon, 30 May 2005 11:24:01 -0700 Received: (qmail invoked by alias); 30 May 2005 18:23:58 -0000 Received: from p54B3F727.dip.t-dialin.net (EHLO [192.168.0.104]) [84.179.247.39] by mail.gmx.net (mp015) with SMTP; 30 May 2005 20:23:58 +0200 X-Authenticated: #3816566 Message-ID: <429B5A2C.5020903@gmx.de> Date: Mon, 30 May 2005 20:23:40 +0200 From: Falko Guderian User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: de-DE, de, en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Indexing problem Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi, I indexed 20 documents. I want to evaluate my lucene index. That's why I extract all term with their frequencies in each document. This code has helped a lot. ------------------------------------------------------------- try { TermEnum terms = indexReader.terms(new Term("content", "")); while ("content".equals(terms.term().field())) { TermDocs termDocs = indexReader.termDocs(); termDocs.seek(terms); // ... collect term.term().text() ... int frequency = 0; for(int i = 0; i< indexWriter.numDocs(); i++) { ... freqency = termDocs.freq(); ... termDocs.next(); } if (!terms.next()) break; } } finally { terms.close(); } ------------------------------------------------------------- But there is an anomaly. In the first document(termDocs.doc() = 0) all term frequencies are greater than 0. But it isn't correct. The first doc doesn't contain all terms. Do you now this problem? How can I get the correct term frequencies in all docs? Best regards Falko Guderian --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org