lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Filtering a SpanQuery
Date Mon, 12 May 2008 22:17:00 GMT
Op Monday 12 May 2008 09:06:36 schreef Eran Sevi:
> Thanks Paul,
>
> I'll give your code sample a try.
> I still think that calling getSpans (the first line of code) that
> returns millions of results is going to be much slower than calling
> getSpans that's going to return only a few thousands of results.
> Since the filtering is only performed after calling this method it
> can't help in this case.

The given untested code (apart from possible bugs) would process
exactly as many documents and their Spans as SpanScorer would
in the explicitly filtered case.

> I guess your suggested solution is my best option without changing
> the way getSpans works (which I'm not going to change any time soon )

Before doing that, have a look at the code of SpanWeight/SpanScorer,
ConjunctionScorer, and the filtering code in IndexSearcher.

Regards,
Paul Elschot.



> Eran.
>
> On Wed, May 7, 2008 at 7:22 PM, Paul Elschot <paul.elschot@xs4all.nl> 
wrote:
> > Op Wednesday 07 May 2008 10:18:38 schreef Eran Sevi:
> > > Thanks Paul for your reply,
> > >
> > > Since my index contains a couple of millions documents and the
> > > filter is supposed to limit the search space to a few thousands I
> > > was hoping I won't have to do the filtering myself after running
> > > the query on all the index.
> >
> > The code I gave earlier effectively does a filtered query search
> > on the index. It visits the resulting Spans, and does not provide
> > a score value per document as SpanScorer would do.
> > Please make sure to test that code thoroughly for reliable results.
> >
> > > Maybe this is the case anyway and behind the scenes the filter
> > > does exactly what you suggested.
> >
> > Yes, a filtered query search would use skipTo() on the Spans via
> > SpanScorer. But the difference between the normal case
> > and your case is that you don't need SpanScorer.
> >
> > > From what I tested the number of results of the SpanQuery greatly
> > > affects the running speed so if I'm going to use about 0.1% of
> > > the results I'm loosing a lot of time and memory for gathering
> > > and storing the spans I'm not going to use.
> > >
> > > I don't know how SpanQuery works internally but I guess that if
> > > the filter is known beforehand,
> >
> > A Filter needs to make a BitSet available before the query search.
> >
> > > it could speed things up quite a bit.
> >
> > I would expect a substantial speedup from using skipTo() on the
> > Spans when only 0.1% of the results passes the filter.
> >
> > Regards,
> > Paul Elschot
> >
> > > Eran.
> > >
> > >
> > > On Wed, May 7, 2008 at 10:34 AM, Paul Elschot
> > > <paul.elschot@xs4all.nl>
> > >
> > > wrote:
> > > > Op Tuesday 06 May 2008 17:39:38 schreef Paul Elschot:
> > > > > Eran,
> > > > >
> > > > > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi:
> > > > > > Hi,
> > > > > >
> > > > > > I am looking for a way to filter a SpanQuery according to
> > > > > > some other query (on another field from the one used for
> > > > > > the SpanQuery). I need to get access to the spans
> > > > > > themselves of course. I don't care about the scoring of the
> > > > > > filter results and just need the positions of hits found in
> > > > > > the documents that matches the filter.
> > > > >
> > > > > I think you'll have to implement the filtering on the Spans
> > > > > yourself. That's not really difficult, just use
> > > > > Spans.skipTo(). The code to do that could look sth like this
> > > > > (untested):
> > > > >
> > > > > Spans spans = yourSpanQuery.getSpans(reader);
> > > > > BitSet bits = yourFilter.bits(reader);
> > > > > int filterDoc = bits.nextSetBit(0);
> > > > > while ((filterDoc >= 0) and spans.skipTo(filterDoc)) {
> > > > >   boolean more = true;
> > > > >   while (more and (spans.doc() == filterDoc)) {
> > > > >      // use spans.start() and spans.end() here
> > > > >      // ...
> > > > >      more = spans.next();
> > > > >   }
> > > > >   if (! more) {
> > > > >     break;
> > > > >   }
> > > > >   filterDoc = bits.nextSetBit(spans.doc());
> > > >
> > > > At this point, no skipping on the spans should be done when
> > > > filterDoc equals spans.doc(), so this code still needs some
> > > > work. But I think you get the idea.
> > > >
> > > > Regards,
> > > > Paul Elschot
> > > >
> > > > > }
> > > > >
> > > > > Please check the javadocs of java.util.BitSet, there may
> > > > > be a 1 off error in the arguments to nextSetBit().
> > > > >
> > > > > Regards,
> > > > > Paul Elschot
> > > > >
> > > > > > I tried looking through the archives and found some
> > > > > > reference to a SpanQueryFilter patch, however I don't see
> > > > > > how it can help me achieve what I want to do. This class
> > > > > > receives only one query parameter (which I guess is the
> > > > > > actual query) and not a query and a filter for example.
> > > > > >
> > > > > > Any help about how I can achieve this will be appreciated.
> > > > > >
> > > > > > Thanks,
> > > > > > Eran.
> > > > >
> > > > > -------------------------------------------------------------
> > > > >---- ---- To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org For additional
> > > > > commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > > ---------------------------------------------------------------
> > > >---- -- To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org For additional
> > > > commands, e-mail: java-user-help@lucene.apache.org
> >
> > -------------------------------------------------------------------
> >-- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message