lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Caching of TermDocs
Date Tue, 27 Jul 2004 13:33:01 GMT
John Patterson wrote:
> I would like to hold a significant amount of the index in memory but use the
> disk index as a spill over.  Obviously the best situation is to hold in
> memory only the information that is likely to be used again soon.  It seems
> that caching TermDocs would allow popular search terms to be searched more
> efficiently while the less common terms would need to be read from disk.

The operating system already caches recent disk i/o.  So what you'd save 
primarily would be the overhead of parsing the data.  However the parsed 
form, a sequence of docNo and freq ints, is nearly eight times as large 
as its compressed size in the index.  So your cache would consume a lot 
of memory.

Whether it this provide much overall speedup depends on the distribution 
of common terms in your query traffic.  If you have a few terms that are 
searched very frequently then it might pay off.  In my experience with 
general-purpose search engines this is not usually the case: folks seem 
to use rarer words in queries than they do in ordinary text.  But in 
some search applications perhaps the traffic is more skewed.  Only some 
experiments would tell for sure.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message