lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Kumar <ami...@uiuc.edu>
Subject IndexReader.getTermFreqVector penality
Date Wed, 09 Aug 2006 19:13:21 GMT
Hi Lucene Users,

I am using the lucene indices to get term frequencies. I just wanted  
to check with you about the
time it is taking to retrieve these term freq. Please suggest if I  
can improve the code/index or if
this is expected. It takes 8 to 9 seconds to retrieve the term freq  
values of all 1030 documents,
with an index size of ~530MB.

Another question I have is Do I need to have Field.Store.Yes to get  
the term freq vector?

Index Details:
-------------------
Size: 532 MB,
1032 Documents with varying number of terms from 600 to 100,000
The field is indexed as Field.Store.YES,  
Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS


Term Freq Retrieval Time Values:
-------------------------------------

The time ranges in 8 to 9 seconds

  long s = System.currentTimeMillis();
  TermFreqVector termFreqVector;
     for (int i = 0; i < 1030; i++) {
       if (!reader.isDeleted(i)) {
        termFreqVector   = reader.getTermFreqVector(i, field);
        }
     }
     long l = System.currentTimeMillis();


Hardware and Memory Settings:
-------------------------------------------
-Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m

Dual 1800 MHz Optron on 32 bit Linux 2.6.15.2; Lucene 2.0.0.




How can I get better results? Can I?



Many thanks for your help.
-Amit





---------------------------------------------------------
Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302
---------------------------------------------------------





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message