lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Counting entries in an index
Date Sat, 20 Feb 2010 22:25:07 GMT
what is the behavior you're seeing? It's hard to answer "what's wrong"
without any information on what expectation you have that's not being met.
Are you getting 0? a number that seems wrong? Have you examined your index
with Luke to see if terms that start with your prefix actually are in the
index? Are you getting an exception?

What are you seeing when you step through in the debugger?

Best
Erick

On Sat, Feb 20, 2010 at 4:23 PM, Apoorv Sharma <mail.apoorv@gmail.com>wrote:

> Hello,
>
> I am trying to count the total of number of posting entries for terms
> having
> a given prefix in an index. Also count the number of such terms in the
> index.
>
> The following is the code I am using for that. The problem is the result is
> not as expected.
>
>
> Can you point out if what am I doing something wrong:
>
>
> ASSUMPTION:
>
> Index has had no deletions.
>
> INPUT:
>
> prefix: the prefix that terms should match.
>
> VARIABLES:
>
> set:         a set of unique terms found in the index having given prefix
> wordcount:          the number of unique terms in the index having given
> prefix
> termFreqCount:           final result which will be returned
>
>
> CODE:
>
> public long countTotalPositingEntriesInIndex(String prefix) {
> int wordCount = 0;
> int documentId = -1;
> long termFreqCount = 0;
>
> HashSet<String> set = new HashSet<String>();
>
> for (int i = 0; i < index.length; i++) {
> while (documentId < index[i].getIndexReader().maxDoc() - 1) {
> documentId++;
>
> try {
> TermFreqVector tfv[] = index[i].getIndexReader()
> .getTermFreqVectors(documentId);
> if (tfv == null)
> continue;
> for (int fieldCount = 0; fieldCount < tfv.length; fieldCount++) {
> String terms[] = tfv[fieldCount].getTerms();
> int termFreq[] = tfv[fieldCount].getTermFrequencies();
> for (int termCount = 0; termCount < terms.length; termCount++) {
> if (terms[termCount].toLowerCase().startsWith(
> prefix.toLowerCase()))
> {
> if( !set.contains(terms[termCount]))
> {
> wordCount++;
> set.add(terms[termCount].toLowerCase());
> }
> termFreqCount += termFreq[termCount];
> }
> }
>
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> return termFreqCount;
> }
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message