lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Span query performance issue
Date Sat, 25 Jun 2005 08:49:56 GMT
On Saturday 25 June 2005 04:26, jian chen wrote:
> Hi,
> 
> I think Span query in general should do more work than simple Phrase
> query. Phrase query, in its simplest form, should just try to find all
> terms that are adjacent to each other. Meanwhile, Span query does not
> necessary be adjacent to each other, but, with other words in between.
> 
> Therefore, I think Span query deserves to be slower than Phrase query.
> This said, Span query is way more powerful than Phrase query.
> 
> Jian
> 
> On 25 Jun 2005 00:00:18 -0000, yahootintin.11533894@bloglines.com
> <yahootintin.11533894@bloglines.com> wrote:
> > Hi,
> > 
> > I'm comparing SpanNearQuery to PhraseQuery results and noticing about
> > an 8x difference on Linux.  Is a SpanNearQuery doing 8x as much work?
> > 
> > 
> > I'm considering diving into the code if the results sounds unusual to 
people.
> >  But if its really doing that much more work, I won't spend time 
optimizing
> > something that can't get much faster.

The main difference is in the extra generality of Spans over positions.
Spans have a begin position and an end position.
Matching two Spans for  the terms of a phrase requires testing both
their begin positions and their end positions, even though they differ
only by a constant for the same term.
Spans also carry around their current document number and this may
involve some more redundancies when finding finding the matches
within a single document.
Also, for exact matches (zero slop) PhraseQuery uses a separate scorer
that takes full advantage of the special case.
So, when the generality of the Spans is not needed, one should always
try and use a PhraseQuery. 

I'm not surprised that SpanNearQuery is slower than PhraseQuery,
and I'd expect a factor 3-4 between them. The factor 8 might indicate that
there is some room for improvement in the span package.
(I'd expect the CellQueue in NearSpans to be the bottleneck.)

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message