lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <grant.ingers...@gmail.com>
Subject Re: IndexReader.getTermFreqVector penality
Date Wed, 09 Aug 2006 19:34:20 GMT
Hi Amit,

If you want all the freqs of all the terms (or even just some of the  
terms) in all documents, you don't need to use Term Vectors, take a  
look at TermEnum and TermDocs.

If you want for specific documents, then you do need Term Vectors.   
You may get some CPU ticks by only keeping Positions (and not  
offsets, assuming you don't need them), but I haven't confirmed  
this.  I just figure it is slightly less data to read and write so it  
should be faster.

No, you do not need to Store your docs to have a term vector.  They  
are written to different parts of the Index.

-Grant

On Aug 9, 2006, at 3:13 PM, Amit Kumar wrote:

> Hi Lucene Users,
>
> I am using the lucene indices to get term frequencies. I just  
> wanted to check with you about the
> time it is taking to retrieve these term freq. Please suggest if I  
> can improve the code/index or if
> this is expected. It takes 8 to 9 seconds to retrieve the term freq  
> values of all 1030 documents,
> with an index size of ~530MB.
>
> Another question I have is Do I need to have Field.Store.Yes to get  
> the term freq vector?
>
> Index Details:
> -------------------
> Size: 532 MB,
> 1032 Documents with varying number of terms from 600 to 100,000
> The field is indexed as Field.Store.YES,  
> Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS
>
>
> Term Freq Retrieval Time Values:
> -------------------------------------
>
> The time ranges in 8 to 9 seconds
>
>  long s = System.currentTimeMillis();
>  TermFreqVector termFreqVector;
>     for (int i = 0; i < 1030; i++) {
>       if (!reader.isDeleted(i)) {
>        termFreqVector   = reader.getTermFreqVector(i, field);
>        }
>     }
>     long l = System.currentTimeMillis();
>
>
> Hardware and Memory Settings:
> -------------------------------------------
> -Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m
>
> Dual 1800 MHz Optron on 32 bit Linux 2.6.15.2; Lucene 2.0.0.
>
>
>
>
> How can I get better results? Can I?
>
>
>
> Many thanks for your help.
> -Amit
>
>
>
>
>
> ---------------------------------------------------------
> Amit Kumar
> Research Programmer
> The Graduate School of Library and Information Science
> University of Illinois, Urbana Champaign IL, 61820
> phone: 217-333-4118 fax: 217-244-3302
> ---------------------------------------------------------
>
>
>
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message