lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eran Sevi" <erans...@gmail.com>
Subject Re: Filtering a SpanQuery
Date Wed, 07 May 2008 08:18:38 GMT
Thanks Paul for your reply,

Since my index contains a couple of millions documents and the filter is
supposed to limit the search space to a few thousands I was hoping I won't
have to do the filtering myself after running the query on all the index.

Maybe this is the case anyway and behind the scenes the filter does exactly
what you suggested.

>From what I tested the number of results of the SpanQuery greatly affects
the running speed so if I'm going to use about 0.1% of the results I'm
loosing a lot of time and memory for gathering and storing the spans I'm not
going to use.

I don't know how SpanQuery works internally but I guess that if the filter
is known beforehand, it could speed things up quite a bit.

Eran.


On Wed, May 7, 2008 at 10:34 AM, Paul Elschot <paul.elschot@xs4all.nl>
wrote:

> Op Tuesday 06 May 2008 17:39:38 schreef Paul Elschot:
> > Eran,
> >
> > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi:
> > > Hi,
> > >
> > > I am looking for a way to filter a SpanQuery according to some
> > > other query (on another field from the one used for the SpanQuery).
> > > I need to get access to the spans themselves of course.
> > >  I don't care about the scoring of the filter results and just need
> > > the positions of hits found in the documents that matches the
> > > filter.
> >
> > I think you'll have to implement the filtering on the Spans yourself.
> > That's not really difficult, just use Spans.skipTo().
> > The code to do that could look sth like this (untested):
> >
> > Spans spans = yourSpanQuery.getSpans(reader);
> > BitSet bits = yourFilter.bits(reader);
> > int filterDoc = bits.nextSetBit(0);
> > while ((filterDoc >= 0) and spans.skipTo(filterDoc)) {
> >   boolean more = true;
> >   while (more and (spans.doc() == filterDoc)) {
> >      // use spans.start() and spans.end() here
> >      // ...
> >      more = spans.next();
> >   }
> >   if (! more) {
> >     break;
> >   }
> >   filterDoc = bits.nextSetBit(spans.doc());
>
> At this point, no skipping on the spans should be done when filterDoc
> equals spans.doc(), so this code still needs some work.
> But I think you get the idea.
>
> Regards,
> Paul Elschot
>
>
> > }
> >
> > Please check the javadocs of java.util.BitSet, there may
> > be a 1 off error in the arguments to nextSetBit().
> >
> > Regards,
> > Paul Elschot
> >
> > > I tried looking through the archives and found some reference to a
> > > SpanQueryFilter patch, however I don't see how it can help me
> > > achieve what I want to do. This class receives only one query
> > > parameter (which I guess is the actual query) and not a query and a
> > > filter for example.
> > >
> > > Any help about how I can achieve this will be appreciated.
> > >
> > > Thanks,
> > > Eran.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message