Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BBF4ED935 for ; Fri, 30 Nov 2012 07:01:09 +0000 (UTC) Received: (qmail 37663 invoked by uid 500); 30 Nov 2012 07:01:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37568 invoked by uid 500); 30 Nov 2012 07:01:05 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37516 invoked by uid 99); 30 Nov 2012 07:01:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Nov 2012 07:01:02 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [213.239.154.11] (HELO adonis.tweakers.net) (213.239.154.11) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Nov 2012 07:00:54 +0000 Received: from [192.168.0.15] (h101063.upc-h.chello.nl [62.194.101.63]) by adonis.tweakers.net (Postfix) with ESMTPSA id 600DFF400EA for ; Fri, 30 Nov 2012 08:00:31 +0100 (CET) Message-ID: <50B85992.30707@tweakers.net> Date: Fri, 30 Nov 2012 08:00:34 +0100 From: Arjen van der Meijden User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Does anyone have tips on managing cached filters? References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org We have something similar with documens that can be tagged (and have many other relations). But for the matter of search we have two distinctions from your aproach: - We do actually index the relation's id (i.e. the tag's id) as part of the lucene-document and update the document if that relation betweenthe item and a tag is changed. So a filter on some 'tag' becomes a trivial termsFilter.addTerm('tagId', '12345). - We use Lucene only as a base of the results we're going to send back to the user. I.e. we get results from Lucene and than do some more processing on them. But that last distinction is actually because we started with an in-memory "database" application that did basically what Lucene already does, but just with more complicated objects and more complicated facet-extraction, more complicated filters, etc. So Lucene is only used when we need keyword-filtering and we help Lucene do that quickly by offering some Filters derived from the rest of the application's work. And yes, if we were to redesign the application, it could become different :P Best regards, Arjen On 29-11-2012 6:57 Trejkaz wrote: > On Wed, Nov 28, 2012 at 6:28 PM, Robert Muir wrote: >> My point is really that lucene (especially clear in 4.0) assumes >> indexreaders are immutable points in time. I don't think it makes sense for >> us to provide any e.g. filtercaching or similar otherwise, because this is >> a key simplification to the design. If you depart from this, by scoring or >> filtering from mutable stuff outside the inverted index, things are likely >> going to get complicated. > > Whereas it would be lovely to live in a land of rainbows and unicorns > where all the data you ever want to use is in the text index and all > filters can be written as a query, that simply isn't the case for us > and I very much doubt we're not the only ones in this situation. > > Sure, things are complicated. Anything except the most trivial forum > search application is complicated. > > Well, the situation as it stands now is that when a filter is > invalidated, it happens across all stores which are currently open. > That means that results are at least correct, but after invalidating a > filter, a little more work than necessary is required to populate the > cache again. For certain filters (like word lists) this is necessary > anyway, since adding a word might invalidate any store. For others > like tags, I was hoping there would be some way to selectively > invalidate only certain readers. But it seems like that isn't the > case, so I will probably have to add a third level of caching to cache > these sorts of filter per-store instead of globally. > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org