lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Kumar <>
Subject IndexReader.getTermFreqVector penality
Date Wed, 09 Aug 2006 19:13:21 GMT
Hi Lucene Users,

I am using the lucene indices to get term frequencies. I just wanted  
to check with you about the
time it is taking to retrieve these term freq. Please suggest if I  
can improve the code/index or if
this is expected. It takes 8 to 9 seconds to retrieve the term freq  
values of all 1030 documents,
with an index size of ~530MB.

Another question I have is Do I need to have Field.Store.Yes to get  
the term freq vector?

Index Details:
Size: 532 MB,
1032 Documents with varying number of terms from 600 to 100,000
The field is indexed as Field.Store.YES,  

Term Freq Retrieval Time Values:

The time ranges in 8 to 9 seconds

  long s = System.currentTimeMillis();
  TermFreqVector termFreqVector;
     for (int i = 0; i < 1030; i++) {
       if (!reader.isDeleted(i)) {
        termFreqVector   = reader.getTermFreqVector(i, field);
     long l = System.currentTimeMillis();

Hardware and Memory Settings:
-Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m

Dual 1800 MHz Optron on 32 bit Linux; Lucene 2.0.0.

How can I get better results? Can I?

Many thanks for your help.

Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message