lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: caching term information?
Date Mon, 22 May 2006 16:33:21 GMT
Marvin Humphrey wrote:
> On May 20, 2006, at 12:01 AM, Robert Engels wrote:
>> Maybe don't cache the term pages, then, just cache the frequently  
>> requested
>> terms themselves.
> That sounds like a winner.  Search term frequencies follow a power  law 
> distribution.  Cache the top 20% or so in an LRU and you'll cut  down on 
> disk seeks and linear scanning significantly.

Keep in mind that the .tis file is compressed: it uses far less memory 
per term than a TermInfo does.  So, to minimize disk i/o, one should 
leave things compressed and cache portions of the .tis file instead. 
The OS's buffer cache should do this well for you.  But if the system 
call overhead is causing significant delay, then the .tis file could be 
memory mapped.  And if constructing and scanning TermInfos is the 
primary delay, then, of course, a cache of TermInfo's might be 
indicated.  In summary, there are lots of possible places to optimize 
here, but it's not clear which, if any, are warranted.

Folks have benchmarked a TermInfo cache before and not found it 
advantagous.  But perhaps your uses are sufficiently different that this 
is no longer the case.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message