lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apoorv Sharma <mail.apo...@gmail.com>
Subject Counting entries in an index
Date Sat, 20 Feb 2010 21:23:43 GMT
Hello,

I am trying to count the total of number of posting entries for terms having
a given prefix in an index. Also count the number of such terms in the
index.

The following is the code I am using for that. The problem is the result is
not as expected.


Can you point out if what am I doing something wrong:


ASSUMPTION:

Index has had no deletions.

INPUT:

prefix: the prefix that terms should match.

VARIABLES:

set:         a set of unique terms found in the index having given prefix
wordcount:          the number of unique terms in the index having given
prefix
termFreqCount:           final result which will be returned


CODE:

public long countTotalPositingEntriesInIndex(String prefix) {
int wordCount = 0;
int documentId = -1;
long termFreqCount = 0;

HashSet<String> set = new HashSet<String>();

for (int i = 0; i < index.length; i++) {
while (documentId < index[i].getIndexReader().maxDoc() - 1) {
documentId++;

try {
TermFreqVector tfv[] = index[i].getIndexReader()
.getTermFreqVectors(documentId);
if (tfv == null)
continue;
for (int fieldCount = 0; fieldCount < tfv.length; fieldCount++) {
String terms[] = tfv[fieldCount].getTerms();
int termFreq[] = tfv[fieldCount].getTermFrequencies();
for (int termCount = 0; termCount < terms.length; termCount++) {
if (terms[termCount].toLowerCase().startsWith(
prefix.toLowerCase()))
{
if( !set.contains(terms[termCount]))
{
wordCount++;
set.add(terms[termCount].toLowerCase());
}
termFreqCount += termFreq[termCount];
}
}

}
} catch (IOException e) {
e.printStackTrace();
}
}
}
return termFreqCount;
}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message