lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Munavalli" <findm...@gmail.com>
Subject Re: Phrase query vs span query
Date Wed, 22 Feb 2006 15:39:49 GMT
On 2/22/06, Paul Elschot <paul.elschot@xs4all.nl> wrote:
>
> >
> > Typical Query:
> > ---------------------
> > Consists of 15 to 30 query terms. In other words, these query terms
> > represent a conceptual section.
>
> Would you need synonyms of these terms, too?


Yes.


> > (2) After considering the way different queries work and their
> limitations,
> > I think forming phrase/span queries of groups of query terms
> > might approximate the rankings I am expecting. In that case which of the
> > following queries will perform better (in terms of QUERY SPEED and
> RANKING)
> >               (a) phrase query with certain slope factor
> >               (b) span query
>
> SpanQuery is slower than PhraseQuery, but it has the advantage that it can
> be nested. Nesting here means the possibility to use eg. a short phrase as
> a unit to be matched and scored.


I wasn't aware of the capability to nest spanquery. Is there a link where I
could read more about this?

 To formulate a single query for your requirements,
> there is still the problem that PhraseQuery and SpanQuery only work when
> all their "terms" are present in an indexed lucene document field.
> Putting it differently, when fewer terms present, their order cannot
> be taken into account, unless the query contains an (non)ordered query
> specifying a subset of the terms present in the documents.
>

I was thinking of building a boolean combination of either phrase/span query
on subset of terms. Though its not exhaustive, but might be sufficient in
majority of the cases.

An alternative to the current span query implementation is here:
> http://issues.apache.org/jira/browse/LUCENE-413
> but this will only help to get an impression of how to match in the
> ordered
> and unordered cases.
> It might be possible to generalize the various span algorithms there and
> in the trunk to work with fewer "terms".
>
I will consider that option.

Thanks,

Rajesh Munavalli

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message