Hi Amit,
If you want all the freqs of all the terms (or even just some of the
terms) in all documents, you don't need to use Term Vectors, take a
look at TermEnum and TermDocs.
If you want for specific documents, then you do need Term Vectors.
You may get some CPU ticks by only keeping Positions (and not
offsets, assuming you don't need them), but I haven't confirmed
this. I just figure it is slightly less data to read and write so it
should be faster.
No, you do not need to Store your docs to have a term vector. They
are written to different parts of the Index.
-Grant
On Aug 9, 2006, at 3:13 PM, Amit Kumar wrote:
> Hi Lucene Users,
>
> I am using the lucene indices to get term frequencies. I just
> wanted to check with you about the
> time it is taking to retrieve these term freq. Please suggest if I
> can improve the code/index or if
> this is expected. It takes 8 to 9 seconds to retrieve the term freq
> values of all 1030 documents,
> with an index size of ~530MB.
>
> Another question I have is Do I need to have Field.Store.Yes to get
> the term freq vector?
>
> Index Details:
> -------------------
> Size: 532 MB,
> 1032 Documents with varying number of terms from 600 to 100,000
> The field is indexed as Field.Store.YES,
> Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS
>
>
> Term Freq Retrieval Time Values:
> -------------------------------------
>
> The time ranges in 8 to 9 seconds
>
> long s = System.currentTimeMillis();
> TermFreqVector termFreqVector;
> for (int i = 0; i < 1030; i++) {
> if (!reader.isDeleted(i)) {
> termFreqVector = reader.getTermFreqVector(i, field);
> }
> }
> long l = System.currentTimeMillis();
>
>
> Hardware and Memory Settings:
> -------------------------------------------
> -Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m
>
> Dual 1800 MHz Optron on 32 bit Linux 2.6.15.2; Lucene 2.0.0.
>
>
>
>
> How can I get better results? Can I?
>
>
>
> Many thanks for your help.
> -Amit
>
>
>
>
>
> ---------------------------------------------------------
> Amit Kumar
> Research Programmer
> The Graduate School of Library and Information Science
> University of Illinois, Urbana Champaign IL, 61820
> phone: 217-333-4118 fax: 217-244-3302
> ---------------------------------------------------------
>
>
>
>
------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|