Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 21346 invoked from network); 3 Jun 2009 08:04:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jun 2009 08:04:12 -0000 Received: (qmail 61071 invoked by uid 500); 3 Jun 2009 08:04:23 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 61002 invoked by uid 500); 3 Jun 2009 08:04:23 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 60994 invoked by uid 99); 3 Jun 2009 08:04:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 08:04:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates 209.85.219.179 as permitted sender) Received: from [209.85.219.179] (HELO mail-ew0-f179.google.com) (209.85.219.179) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 08:04:15 +0000 Received: by ewy27 with SMTP id 27so9573966ewy.5 for ; Wed, 03 Jun 2009 01:03:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=srF+quQAqklbLj14ThTedevc0VHjQD2qAxoAVIuda7U=; b=dOgzXwlBuMIKTym1XiEI1dnCv1/eB9Sgf154KvKTOU+/vcIfaEM+rFBtVN2sCL5y6i uRfmamJegDz6qltA/uu7pMBjb7ZjwGo15TLx4HfQwzIAfCWmzB+siIuYqb8QQvhoU8qf xQnFPGLA3j7Bpc7VaPrDihFMC0YkW5CWHb+Uw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=mX3elHMwBOJ0Lpxzu1JVpnqRXf7/lzj3D5RBNZTAzX8OySMFFu+qsYsFhXXF/oNn+L 94+cQp9GhhNiX0BHJyIpMUginKxzKlNwSzzU06f67OcvzuzGUscRVpYpfLEjBJz9wwiY lWut6MuMBtS2g5zglKHJFOG5egBHuLyJY9pNI= MIME-Version: 1.0 Received: by 10.216.26.77 with SMTP id b55mr211348wea.101.1244016233392; Wed, 03 Jun 2009 01:03:53 -0700 (PDT) In-Reply-To: <200906022035.04263.paul.elschot@xs4all.nl> References: <786fde50906020739y55394f7n85bb42d31149bc14@mail.gmail.com> <200906022035.04263.paul.elschot@xs4all.nl> Date: Wed, 3 Jun 2009 11:03:53 +0300 Message-ID: <786fde50906030103y5767a039sca06c7a9a8ee1368@mail.gmail.com> Subject: Re: Question on CachingWrapperFilter From: Shai Erera To: java-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e65b5c70afaa05046b6d1925 X-Virus-Checked: Checked by ClamAV on apache.org --0016e65b5c70afaa05046b6d1925 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks Paul ! I'll work such a utility (which takes a Filter and reads it into an OpenBitSet, SortedVIntList) and then post back in case you'll be interested in adopting it, and change CWF to use it, or something else. Shai On Tue, Jun 2, 2009 at 9:35 PM, Paul Elschot wrote: > On Tuesday 02 June 2009 16:39:06 Shai Erera wrote: > > Hi > > > > I read CWF today and initially I thought this is going to cache a Filter > > in-memory for me, so that I can more efficiently use it for subsequent > > searches. But I learned that all it does is cache the DocIdSet returned > by > > the wrapped Filter. > > > > This is good in and on itself, but I wonder if we shouldn't go the extra > > mile and wrap stuff in memory for Filters which don't operate from > memory. > > > It was good until QueryWrapperFilter returned a Scorer instead of a disi > based on an (Open)BitSet. > > > > For example - I have a Filter which reads information from a Payload as > it's > > iterated on, so it doesn't keep anything in memory (it's per-user > > information, so I haven't decided yet if I can afford caching it > in-memory > > and whether it will be beneficial). Caching that sort of Filter by CWF > will > > obviously not improve anything. > > > > I'm not sure what to do here: > > 1. Just reflect that in the javadoc (it is very confusing saying "Wraps > > another filter's result and caches it", which is not true) > > 2. Introduce a class which takes a Filter and loads it into memory (I > think > > I read an issue/discussion about this), to an OpenBitSet for example (but > we > > need to know the number of results in advance, or grow the array as we go > > along). > > 3. Don't use CWF, write a "load-a-Filter-into-in-memory-Filter" utility, > and > > cache the Filters w/ the user as Key. > > > For that, one could subclass CWF and override the docIdSetToCache method > to return an OpenBitSetDISI constructed from the given disi. > > > > I will probably need to do the second part of (3) anyway, so I'm asking > > whether such a utility is useful to exist in Lucene, and perhaps there's > > already one (I thought I read somewhere about the ability to execute a > Query > > and get back a Filter, or use the results as a Filter)? > > > That is what QueryWrapperFilter does. > > > > I looked at > > QueryWrapperFilter, but it doesn't seem to give me what I need, since its > > getDocIdSet method returns an iterator which is the Scorer of the Query > that > > it wraps. > > > The Scorer seems to be what you need, but there are cheaper disis, see > below. > > > > > > Anyway, I think the documentation of CWF should be fixed and made > clearer. > > > > Any thoughts? > > > The basic problem is that disis from DocIdSets come in two variations: > expensive > ones e.g. based on a query, and cheap ones based e.g. on an OpenBitSet or > on > a SortedVIntList. > One would normally want to cache a DocIdSet that provides a cheap disi. > > > For the javadocs of the current CWF it could be sufficient to mention more > prominently that the default CWF caches the given DocIdSet, basically > assuming that it's disi is cheap. > > > But it might be a good idea to change the default implementation to check > whether the given DocIdSet is an OpenBitSet, and use that to be cached in > that case, and otherwise provide an OpenBitSetDISI. > > > Regards, > Paul Elschot > > > --0016e65b5c70afaa05046b6d1925 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks Paul !

I'll work such a utility (which t= akes a Filter and reads it into an OpenBitSet, SortedVIntList) and then pos= t back in case you'll be interested in adopting it, and change CWF to u= se it, or something else.

Shai

On Tue, Jun 2, 2009 at 9:35 PM, = Paul Elschot <paul.elschot@xs4all.nl> wrote:
On Tuesday 02 June 2009 16:= 39:06 Shai Erera wrote:
> Hi
>
> I read CWF today and initially I thought this is going to cache a Filt= er
> in-memory for me, so that I can more efficiently use it for subsequent=
> searches. But I learned that all it does is cache the DocIdSet returne= d by
> the wrapped Filter.
>
> This is good in and on itself, but I wonder if we shouldn't go the= extra
> mile and wrap stuff in memory for Filters which don't operate from= memory.


It was good until= QueryWrapperFilter returned a Scorer instead of a disi
based on an (Open)BitSet.


> For example - I ha= ve a Filter which reads information from a Payload as it's
> iterated on, so it doesn't keep anything in memory (it's per-u= ser
> information, so I haven't decided yet if I can afford caching it i= n-memory
> and whether it will be beneficial). Caching that sort of Filter by CWF= will
> obviously not improve anything.
>
> I'm not sure what to do here:
> 1. Just reflect that in the javadoc (it is very confusing saying "= ;Wraps
> another filter's result and caches it", which is not true) > 2. Introduce a class which takes a Filter and loads it into memory (I = think
> I read an issue/discussion about this), to an OpenBitSet for example (= but we
> need to know the number of results in advance, or grow the array as we= go
> along).
> 3. Don't use CWF, write a "load-a-Filter-into-in-memory-Filte= r" utility, and
> cache the Filters w/ the user as Key.


For that, one cou= ld subclass CWF and override the docIdSetToCache method
to return an OpenBitSetDISI constructed from the given disi.


> I will probably ne= ed to do the second part of (3) anyway, so I'm asking
> whether such a utility is useful to exist in Lucene, and perhaps there= 's
> already one (I thought I read somewhere about the ability to execute a= Query
> and get back a Filter, or use the results as a Filter)?


That is what Quer= yWrapperFilter does.


> I looked at
> QueryWrapperFilter, but it doesn't seem to give me what I need, si= nce its
> getDocIdSet method returns an iterator which is the Scorer of the Quer= y that
> it wraps.


The Scorer seems = to be what you need, but there are cheaper disis, see below.


>
> Anyway, I think the documentation of CWF should be fixed and made clea= rer.
>
> Any thoughts?


The basic problem= is that disis from DocIdSets come in two variations: expensive
ones e.g. based on a query, and cheap ones based e.g. on an OpenBitSet or o= n
a SortedVIntList.
One would normally want to cache a DocIdSet that provides a cheap disi.


For the javadocs of the= current CWF it could be sufficient to mention more
prominently that the default CWF caches the given DocIdSet, basically
assuming that it's disi is cheap.


But it might be a good = idea to change the default implementation to check
whether the given DocIdSet is an OpenBitSet, and use that to be cached in that case, and otherwise provide an OpenBitSetDISI.


Regards,
Paul Elschot



--0016e65b5c70afaa05046b6d1925--