Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: neutral (asf.osuosl.org: 81.228.11.159 is neither permitted nor
 denied by domain of karl.wettin@gmail.com)
Subject: Re: a thought on cache
From: karl wettin <karl.wettin@gmail.com>
To: solr-user@lucene.apache.org
In-Reply-To: <Pine.LNX.4.58.0608032343150.3191@hal.rescomp.berkeley.edu>
References: <1154673229.5704.149.camel@localhost>
	 <Pine.LNX.4.58.0608032343150.3191@hal.rescomp.berkeley.edu>
Content-Type: text/plain
Organization: snigel heavy industries
Date: Fri, 04 Aug 2006 09:34:04 +0200
Message-Id: <1154676845.5704.164.camel@localhost>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

On Thu, 2006-08-03 at 23:53 -0700, Chris Hostetter wrote:

>   1) as new docs come in, add them to a purely in memory index
>   2) when it becomes time to "commit" the new documents, test all queries
>      in the cache against this in memory index.
>   3) any query in the cache which has a hit on this in memory index should
>      be invalidated, any query which does not have a hit is still valid.

You got it.

> ...this could probably work if the index was purely additive 

> check if one of the cached queries matched on the deleted document

Hmm, didn't see that one coming. Quick and dirt would be to rebuild
the document for original source. Have to think of a better solution
than that though.

> the next segment merge could collapse doc ids above deleted docs which
> were totally unrelated to any docs that were added or deleted -- so
> you would think they are still valid even though the doc ids in the
> cache don't correspond to the same documents anymore.

This is not the first time I think of low level hooks in the index. If
an optimization could report changes this would not be a problem, or?

> while the "old" IndexSearcher is still being used by external requests
> (and still using it's cache) a new "on deck" IndexSearcher is opened,
> and an internal thread is running queries against it (the results of

I do something similar to that. But all them queries (in some cases
tens of thousands and a frequently updated index) hogs more CPU than I
think it has to. I'm low on CPU (spent on real time collaborative
filtering et.c.) but have more or less an unlimited amount of RAM.