lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: 1.4.x TermInfosWriter.indexInterval not public static ?
Date Tue, 01 Mar 2005 17:46:47 GMT
Kevin A. Burton wrote:
> BTW.. can you define "a bit"...

Merriam-Webster says:

   a bit : SOMEWHAT, RATHER

> Is "a bit" 5%?  10%?  Benchmarks would be ncie but I'm not that picky.  

If you want benchmarks, make benchmarks.

> I just want to see what performance hits/benefits I could see by 
> tweaking the values.

This parameter determines the amount of computation required per query 
term, regardless of the number of documents that contain that term.  In 
particular, it is the maximum number of other terms that must be scanned 
before a term is located and its frequency and position information may 
be processed.  In a large index with user-entered query terms, query 
processing time is likely to be dominated not by term lookup but rather 
by the processing of frequency and positional data.  In a small index or 
when many uncommon query terms are generated (e.g., by wildcard queries) 
term lookup may become a dominant cost.  Benchmarking your application 
is the best way to determine this.

There is no single percentage answer.  There are cases where 99% of the 
query processing is in term lookup and there are cases where 1% of the 
query processing is in term lookup.  Chances are that, with a large 
index and user-entered query terms, only a small percentage of the time 
is spent in term lookup and thus increasing this value somewhat will not 
affect overall performance much.

If you need something more precise than "much" or "a bit", measure it.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message