lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Filtering a SpanQuery
Date Wed, 07 May 2008 16:22:23 GMT
Op Wednesday 07 May 2008 10:18:38 schreef Eran Sevi:
> Thanks Paul for your reply,
>
> Since my index contains a couple of millions documents and the filter
> is supposed to limit the search space to a few thousands I was hoping
> I won't have to do the filtering myself after running the query on
> all the index.

The code I gave earlier effectively does a filtered query search
on the index. It visits the resulting Spans, and does not provide
a score value per document as SpanScorer would do.
Please make sure to test that code thoroughly for reliable results.

>
> Maybe this is the case anyway and behind the scenes the filter does
> exactly what you suggested.

Yes, a filtered query search would use skipTo() on the Spans via
SpanScorer. But the difference between the normal case
and your case is that you don't need SpanScorer.

> From what I tested the number of results of the SpanQuery greatly
> affects the running speed so if I'm going to use about 0.1% of the
> results I'm loosing a lot of time and memory for gathering and
> storing the spans I'm not going to use.
>
> I don't know how SpanQuery works internally but I guess that if the
> filter is known beforehand,

A Filter needs to make a BitSet available before the query search.

> it could speed things up quite a bit. 

I would expect a substantial speedup from using skipTo() on the
Spans when only 0.1% of the results passes the filter.

Regards,
Paul Elschot

> Eran.
>
>
> On Wed, May 7, 2008 at 10:34 AM, Paul Elschot
> <paul.elschot@xs4all.nl>
>
> wrote:
> > Op Tuesday 06 May 2008 17:39:38 schreef Paul Elschot:
> > > Eran,
> > >
> > > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi:
> > > > Hi,
> > > >
> > > > I am looking for a way to filter a SpanQuery according to some
> > > > other query (on another field from the one used for the
> > > > SpanQuery). I need to get access to the spans themselves of
> > > > course. I don't care about the scoring of the filter results
> > > > and just need the positions of hits found in the documents that
> > > > matches the filter.
> > >
> > > I think you'll have to implement the filtering on the Spans
> > > yourself. That's not really difficult, just use Spans.skipTo().
> > > The code to do that could look sth like this (untested):
> > >
> > > Spans spans = yourSpanQuery.getSpans(reader);
> > > BitSet bits = yourFilter.bits(reader);
> > > int filterDoc = bits.nextSetBit(0);
> > > while ((filterDoc >= 0) and spans.skipTo(filterDoc)) {
> > >   boolean more = true;
> > >   while (more and (spans.doc() == filterDoc)) {
> > >      // use spans.start() and spans.end() here
> > >      // ...
> > >      more = spans.next();
> > >   }
> > >   if (! more) {
> > >     break;
> > >   }
> > >   filterDoc = bits.nextSetBit(spans.doc());
> >
> > At this point, no skipping on the spans should be done when
> > filterDoc equals spans.doc(), so this code still needs some work.
> > But I think you get the idea.
> >
> > Regards,
> > Paul Elschot
> >
> > > }
> > >
> > > Please check the javadocs of java.util.BitSet, there may
> > > be a 1 off error in the arguments to nextSetBit().
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > > > I tried looking through the archives and found some reference
> > > > to a SpanQueryFilter patch, however I don't see how it can help
> > > > me achieve what I want to do. This class receives only one
> > > > query parameter (which I guess is the actual query) and not a
> > > > query and a filter for example.
> > > >
> > > > Any help about how I can achieve this will be appreciated.
> > > >
> > > > Thanks,
> > > > Eran.
> > >
> > > -----------------------------------------------------------------
> > >---- To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org For additional commands,
> > > e-mail: java-user-help@lucene.apache.org
> >
> > -------------------------------------------------------------------
> >-- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message