lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: Memory Usage
Date Wed, 16 Nov 2005 05:15:20 GMT
Good stuff, Daniel...

Thanks for taking the time to tabulate the results and present them.   
If your results hold, it may have a significant impact on my  
application.  I'm working on a Perl/XS port, and I think a lot of  
people who want to run it won't be running mod_perl, so startup times  
are quite important to me.  I may end up setting the default  
IndexInterval considerably higher than 128 as a result of this  
discussion.

The formatting of the results turned up a little screwy in my email  
reader, so here's a reformatted version...

> Timings for a simple TermQuery on the term "one" (docFreq = 22):
>
>    skip    time range for query (ms)    approx mem usage of JVM (MB)
>      1      28 ~  30                     49.2
>      2      28 ~  30
>      4      28 ~  30
>      8      29 ~  31
>     16      29 ~  32                     15.9 (!!)
>     32      29 ~  33
>     64      38 ~  42
>    128      59 ~  61
>    256      99 ~ 102                     14.1
>
> Timings for a simple TermQuery on the term "test" (docFreq = 31,356):
>
>    skip    time range for query (ms)
>      1       6.8 ~  7.6
>     16       9.7 ~ 10.2
>    256      69   ~ 72

> So, more frequent terms get a larger penalty due to this modification,
> but the time was relatively fast to start with.  Rarer terms get  
> less of
> a penalty, perhaps because they already take so much longer to find.

This doesn't sound right to me.  The time to locate the term via the  
TermInfosReader shouldn't have anything to do with the doc_freq,  
since that's kept as a single number in .tis and .tii.  Within the  
term dictionary, all terms are more or less created equal.

I'm only passingly familiar with the org.apache.lucene.search  
package, so I'm not sure what could account for this; I would  
normally expect a more common term to take longer, as there are more  
docs to score.  Anybody got a expanation handy?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message