lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: SpanQuery parser? Update (ugly hack inside...)
Date Tue, 08 Nov 2005 08:28:38 GMT
On Tuesday 08 November 2005 00:05, Sean O'Connor wrote:
...
> When I looked at using surround queries, I believe I got stuck finding 
> access to the spans information. I was pressed for time, and only looked 

They are in the org.apache.lucene.search.spans package, not in the surround
language.

> it over for a couple of hours. A good portion of that was setup time. I 
> probably missed some of the more obvious points, so I will go back and 
> look over the surround code to see how unencumbered by the thought 
> process I was when I decided to reinvent my wheel.
> 
> >Also, PhraseQuery is more efficient than a combination of SpanTermQuery,
> >SpanOrQuery and SpanNearQuery, so perhaps PhraseQuery should have
> >a getSpans() method so it could be used as a SpanQuery, too.
> >  
> >
> This intrigues me. I know I don't understand Lucene past the basics, so 
> I'm a little lost on the implications of your suggestion.
> My impression was (and still is) that PhraseQuery is a significantly 
> different implementation from the SpanQuery family. Is it rather 
> straightforward to implement getSpans() for a PhraseQuery? Granted, I 
> seem to have missed that Surround is based on SpanQueries, so it looks 
> like I need to spend some more time getting my head straight on these 
> topics.

This is mainly a performance issue, so when SpanNearQuery works
for you, and you can live with the slower performance of SpanNearQuery
over PhraseQuery, you can use the surround parser as it is.
The surround parser does not generate PhraseQueries,
but it will generate SpanNearQueries, and it can also nest
SpanNearQueries.

Implementing getSpans() on PhraseQuery may not be straightforward,
because of the differences in the meanings of the slop for PhraseQueries
and SpanNearQueries.

PhraseQuery is built directly on TermPositions as returned from
an IndexReader. In the SpanQuery family, there is a Spans
between the TermPositions of the terms and a SpanNearQuery.
This allows SpanNearQueries to be nested, unlike PhraseQueries.

The current implementation of SpanNearQuery is slower than
PhraseQuery, but the implementation of SpanNearQuery here may 
narrow the gap:
http://issues.apache.org/jira/browse/LUCENE-413
Another way to improve performance might be to implement
a getTermSpans() method directly on the IndexReader.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message