lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eran Sevi" <erans...@gmail.com>
Subject Re: Filtering a SpanQuery
Date Mon, 12 May 2008 07:06:36 GMT
Thanks Paul,

I'll give your code sample a try.
I still think that calling getSpans (the first line of code) that returns
millions of results is going to be much slower than calling getSpans that's
going to return only a few thousands of results. Since the filtering is only
performed after calling this method it can't help in this case.

I guess your suggested solution is my best option without changing the way
getSpans works (which I'm not going to change any time soon )
Eran.
On Wed, May 7, 2008 at 7:22 PM, Paul Elschot <paul.elschot@xs4all.nl> wrote:

> Op Wednesday 07 May 2008 10:18:38 schreef Eran Sevi:
> > Thanks Paul for your reply,
> >
> > Since my index contains a couple of millions documents and the filter
> > is supposed to limit the search space to a few thousands I was hoping
> > I won't have to do the filtering myself after running the query on
> > all the index.
>
> The code I gave earlier effectively does a filtered query search
> on the index. It visits the resulting Spans, and does not provide
> a score value per document as SpanScorer would do.
> Please make sure to test that code thoroughly for reliable results.
>
> >
> > Maybe this is the case anyway and behind the scenes the filter does
> > exactly what you suggested.
>
> Yes, a filtered query search would use skipTo() on the Spans via
> SpanScorer. But the difference between the normal case
> and your case is that you don't need SpanScorer.
>
> > From what I tested the number of results of the SpanQuery greatly
> > affects the running speed so if I'm going to use about 0.1% of the
> > results I'm loosing a lot of time and memory for gathering and
> > storing the spans I'm not going to use.
> >
> > I don't know how SpanQuery works internally but I guess that if the
> > filter is known beforehand,
>
> A Filter needs to make a BitSet available before the query search.
>
> > it could speed things up quite a bit.
>
> I would expect a substantial speedup from using skipTo() on the
> Spans when only 0.1% of the results passes the filter.
>
> Regards,
> Paul Elschot
>
> > Eran.
> >
> >
> > On Wed, May 7, 2008 at 10:34 AM, Paul Elschot
> > <paul.elschot@xs4all.nl>
> >
> > wrote:
> > > Op Tuesday 06 May 2008 17:39:38 schreef Paul Elschot:
> > > > Eran,
> > > >
> > > > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi:
> > > > > Hi,
> > > > >
> > > > > I am looking for a way to filter a SpanQuery according to some
> > > > > other query (on another field from the one used for the
> > > > > SpanQuery). I need to get access to the spans themselves of
> > > > > course. I don't care about the scoring of the filter results
> > > > > and just need the positions of hits found in the documents that
> > > > > matches the filter.
> > > >
> > > > I think you'll have to implement the filtering on the Spans
> > > > yourself. That's not really difficult, just use Spans.skipTo().
> > > > The code to do that could look sth like this (untested):
> > > >
> > > > Spans spans = yourSpanQuery.getSpans(reader);
> > > > BitSet bits = yourFilter.bits(reader);
> > > > int filterDoc = bits.nextSetBit(0);
> > > > while ((filterDoc >= 0) and spans.skipTo(filterDoc)) {
> > > >   boolean more = true;
> > > >   while (more and (spans.doc() == filterDoc)) {
> > > >      // use spans.start() and spans.end() here
> > > >      // ...
> > > >      more = spans.next();
> > > >   }
> > > >   if (! more) {
> > > >     break;
> > > >   }
> > > >   filterDoc = bits.nextSetBit(spans.doc());
> > >
> > > At this point, no skipping on the spans should be done when
> > > filterDoc equals spans.doc(), so this code still needs some work.
> > > But I think you get the idea.
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > > > }
> > > >
> > > > Please check the javadocs of java.util.BitSet, there may
> > > > be a 1 off error in the arguments to nextSetBit().
> > > >
> > > > Regards,
> > > > Paul Elschot
> > > >
> > > > > I tried looking through the archives and found some reference
> > > > > to a SpanQueryFilter patch, however I don't see how it can help
> > > > > me achieve what I want to do. This class receives only one
> > > > > query parameter (which I guess is the actual query) and not a
> > > > > query and a filter for example.
> > > > >
> > > > > Any help about how I can achieve this will be appreciated.
> > > > >
> > > > > Thanks,
> > > > > Eran.
> > > >
> > > > -----------------------------------------------------------------
> > > >---- To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org For additional commands,
> > > > e-mail: java-user-help@lucene.apache.org
> > >
> > > -------------------------------------------------------------------
> > >-- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message