lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Munavalli" <>
Subject Phrase query vs span query
Date Tue, 21 Feb 2006 23:45:12 GMT
I am trying to adopt lucene for a special IR system. The following scenario
is an approximation of what I am trying to do. Please bear with me if some
things doesnt make sense. I need some suggestions on formulating queries for
the following scenario

Each document consists of a set of fields (standard in lucene). But in my
case, the field is somewhat different as explained below.

Each field consists of a set of conceptual sections. Each of these sections
is separated by say N (say 1000) index positions but are in the same field.
Sizes of sections vary and do not have any lower or upper bound on the
number of terms they may contain
Ex: Lets say Field "contents" has
<section1 of 100 terms><gap of 1000 term positions><section 2 of 1500
terms><gap of 1000 term positions><gap of 1000 term positions><section 3
10 terms>

NOTE: At index time, I am assuming I somehow know how to form these

Typical Query:
Consists of 15 to 30 query terms. In other words, these query terms
represent a conceptual section.

Aim of the Query formation:
I want to rank the documents proportional to the number query terms
appearing in the SAME SECTION and IN ORDER. Documents containing terms with

My Questions:
Considering the structure of the fields/documents and the number of query

(1) Is there an effective way of formulating a query with the existing query
types in Lucene?

(2) After considering the way different queries work and their limitations,
I think forming phrase/span queries of groups of query terms
might approximate the rankings I am expecting. In that case which of the
following queries will perform better (in terms of QUERY SPEED and RANKING)
              (a) phrase query with certain slope factor
              (b) span query


Rajesh Munavalli

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message