On Friday 10 August 2007 19:21, mark harwood wrote:
> >>I'll change the CachingWrapperFilter to use a BitSetFilter,
> >>which data structure would you like to have cached for filtering
> >>in the xml query parser?
>
> I think Filter/BitSetFilter might be the wrong choice of object to cache
> in there. It is the *data that Filters create* which is most important
> to cache (currently limited to a BitSet). This data object is what I
> chose to call a DocIdSet and implementations could be a BitSet,
> OpenBitSet or SortedVint structure in your new scheme.
I'll start with a BitSet initially, when nobody beats me to it.
> When using/reusing a DocIdSet (either as part of filtering or
> perhaps counting how search results have fallen into various
> categories) I proposed that client code should call
> DocIdSet.getIterator() to get hold of an iterator for their
> own one-time-only use in iterating across that set.
Right, the only thing left is then how to get a Matcher from this iterator.
> The Filters (a service I chose to call more generically
> "DocIdSetFactory") also need to be cached but only to be used
> as keys for equivalence checks (i.e. on a cached Filter only the
> methods hashcode/equals are called). The rationale for this is
> that the cached Filter is the object best placed to answer the question
> "is this new incoming Filter request the same as one I have already
> processed?". If the cached Filter matches the incoming request (i.e.
> the criteria is the same) then you can look for a cached DocIdSet held
> in a WeakHashMap keyed on IndexReader. If you have a cache hit then
> you call cachedDocIdSet.getIterator(), otherwise take the cost of calling
> newFilter.getDocIdSet(reader) and cache that result.
>
> This is effectively how the remote FilterManager and the XMLQueryParser
> filter caching stuff work with Filters/Bitsets today.
>
> Hope this makes sense.
It does make sense. I suppose the Filter criterium is a Lucene Query?
Anyway, I'll let it sink in over my holidays :)
Regards,
Paul Elschot
>
>
>
>
> ----- Original Message ----
> From: Paul Elschot <paul.elschot@xs4all.nl>
> To: java-dev@lucene.apache.org
> Sent: Friday, 10 August, 2007 5:31:02 PM
> Subject: Re: Fwd: Decouple Filter from BitSet: API change and xml query
parser
>
>
>
> On Friday 10 August 2007 13:12, mark harwood wrote:
> > >>Could someone give me a clue as to why the test case
> TestRemoteCachingWrapperFilter fails with the patch applied?
> >
> > Regardless of the reasons for this particular test failure, this code is
not
> safe in other ways which the test cases don't test for.
> >
> > To restate the issue: Matcher is not designed to be threadsafe
>
> A Matcher is almost a Scorer, the only difference is that it does not
> have a score() method. Scorers are not threadsafe, they are used
> once during a query search. The intention is to use Matchers
> in the same way: once during a query search in case no score value
> is needed.
>
> > and CachingWrapperFilter (or any other example of existing
> > caching strategies) cannot therefore simply be changed to
> > cache Matchers in place of the existing scheme of caching bitsets
> > (which are currently used in a thread-safe manner by all Lucene code).
> > Bitsets don't offer the notion of a cursor (required for "next"
> > methods) while Matcher does which spoils it's potential for
> > reuse/shared use.
>
> The idea is not to cache the Matchers, but the underlying data structure.
>
> > The remoting test code you refer to uses your modified
> > CachingWrapperFilter which has swapped Matchers for BitSets
> > and so I would anticipate thread safety issues if the tests actually
> > tried to share/reuse the same Matcher.
>
> Thanks for taking a look at the code.
> I'll change the CachingWrapperFilter to use a BitSetFilter,
> and then hopefully more test cases will pass.
>
> > >>Finally, are DocIdSet and DocIdSetIterator currently part of Lucene? I
> don't know how to go about these.
> >
> > These are two of the names I gave to a notional set of 3 services that I
> outlined here:
> >
> > https://issues.apache.org/jira/browse/LUCENE-584#action_12518642
> >
> > I introduced this terminology to the discussion because:
> > 1) It describes 2 services that are currently combined in Matcher
> > that I feel need to be separated
>
> The idea of Matcher is that it is a Scorer without a score() method,
> and no more.
>
> > 2) It uses a more generic description of the services offered that can be
> useful when considering other applications of the services (e.g. category
> count and filtering logic both can use cached sets of doc IDs. DocIdSet
> seemed to describe the service more generically than "Matcher")
> >
> > I'm happy to drop use of these terms from this discussion if you
> > feel they are not useful.
>
> I think that DocIdSet has the role of the underlying data structure that
> would be cached, and that DocIdSetIterator is something very close
> to Matcher or even the same thing.
>
> Which brings me to another question: which data structure would
> you like to have cached for filtering in the xml query parser?
> I think initially BitSet would do nicely, but one could also take
> the opportunity to use more compact data structures when possible.
>
>
> Finally one of the examples classes I gave is incomplete, see below.
> I wrote:
> >
> ...
> > As for the API change, how to move from the current:
> >
> > public class Filter {
> > abstract public BitSet bits(IndexReader);
> > }
> >
> > to:
> >
> > public class Filter {
> > abstract public Matcher getMatcher(IndexReader);
> > }
> >
> > The patch proposes to do this by moving all current use of Filter to
> > BitSetFilter:
> >
> > public class BitSetFilter extends Filter {
> > abstract public BitSet bits(IndexReader);
>
> // BitSetFilter also has:
>
> public Matcher getMatcher(IndexReader reader) {
> return DefaultMatcher.defaultMatcher(bits());
> }
>
> > }
>
> Regards,
> Paul Elschot
>
>
> >
> > Would it be good to have an intermediate version of Filter like this
> > one:
> >
> > public class Filter {
> > /** deprecated, use class BitSetFilter instead */
> > public BitSet bits(IndexReader); {return null;}
> > abstract public Matcher getMatcher(IndexReader);
> > }
> >
> >
> ...
> >
> >
> > Regards,
> > Paul Elschot
> >
> >
> >
> >
> >
> >
> >
> >
> > ---------- Forwarded Message ----------
> >
> > Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
> > Date: Friday 10 August 2007 01:15
> > From: "Mark Harwood (JIRA)" <jira@apache.org>
> > To: java-dev@lucene.apache.org
> >
> >
> >
> >
>
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518868
]
> >
> > Mark Harwood commented on LUCENE-584:
> > -------------------------------------
> >
> > OK, I appreciate caching may not be a top priority in this proposal but I
> have
> > live systems in production using XMLQueryParser and which use the existing
> > core facilities for caching. As it stands this proposal breaks this
> > functionality (see "FIXME" in contrib's CachedFilterBuilder and my
concerns
> > over use of unthreadsafe Matcher in the core class CachingWrapperFilter)
> >
> > I am obviously concerned by this and keen to help shape a solution which
> > preserves the existing capabilities while adding your new functionality.
I'm
> > not sure I share your view that support for caching can be treated as a
> > separate issue to be dealt with at a later date. There are a larger number
> of
> > changes proposed in this patch and if the design does not at least
consider
> > future caching issues now, I suspect much will have to be reworked later.
> The
> > change I can envisage most clearly is expressed in my concern that the
> > DocIdSet and DocIdSetIterator services I outlined are being combined in
> > Matcher as it stands now and these functions will have to be separated.
> >
> > Cheers
> > Mark
> >
> ...
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
>
>
> ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
|