Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (hermes.apache.org: domain of mail.to.falko@gmx.de
 designates 213.165.64.20 as permitted sender)
Message-ID: <429B5A2C.5020903@gmx.de>
Date: Mon, 30 May 2005 20:23:40 +0200
From: Falko Guderian <mail.to.falko@gmx.de>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: java-user@lucene.apache.org
Subject: Indexing problem
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

I indexed 20 documents. I want to evaluate my lucene index. That's why I 
extract all term with their frequencies in each document.
This code has helped a lot.
-------------------------------------------------------------
try
{
    TermEnum terms = indexReader.terms(new Term("content", ""));
    while ("content".equals(terms.term().field()))
    {
        TermDocs termDocs = indexReader.termDocs();
	termDocs.seek(terms);
	// ... collect term.term().text() ...
	int frequency = 0;
	for(int i = 0; i< indexWriter.numDocs(); i++) {
	...
	freqency = termDocs.freq();
	...
	termDocs.next();
	}
	if (!terms.next())
            break;
    }
}
finally
{
    terms.close();
}
-------------------------------------------------------------

But there is an anomaly. In the first document(termDocs.doc() = 0) all 
term frequencies are greater than 0.
But it isn't correct. The first doc doesn't contain all terms.

Do you now this problem? How can I get the correct term frequencies in 
all docs?


Best regards
Falko Guderian


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org