lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Munavalli" <>
Subject Re: Phrase query vs span query
Date Wed, 22 Feb 2006 03:04:07 GMT
On 2/21/06, Chris Hostetter <> wrote:
> your "Aim of the Query formation" got truncated, so it's not entirely
> clear what you are looking for, but if the general idea of what you are

Documents should be ordered as follows
Rank 1: Documents containing section containing all terms in the order of
the query terms
Rank 2: Documents containing section containing all terms but randomly
Rank 3: Documents containing atleast n (n  < N, where N is total number of
query terms) in the same section and in order

and so on...

But considering the number of query terms to be large, I am not really
looking for the exact ranking. Anything closer would  work just fine.
However, the number of query terms is what i am more concerned about when it
comes to implementing using Phrase query versus Span query in terms of query

looking for is that you want searches for phrase like "quick brown fox" to
> only match if/when the words "quick" "brown" and "fox" all appear in the
> same section in the specified order, and you want documents in which the
> phrases appear more then once to bescored higher then a simple PhraseQuery
> with a high slop factor and "inOrder=true" should work fine ... the key
> being that your slop value needs to be at least as big as the largest
> section size you can have, and less then the gap you put between sections.
> I have no idea if it will be faster/slower then a span query, but it's a
> little simpler because you don't need to use artificial section boundry
> tokens.

Wouldnt the  phrase/span query match across sections if I do not introduced
the artifical section boundry?

If you want to tweak how much the score is influenced by the proximity of
> the words in the query, vs the frequency of hte phrases in the docs, see
> my recent posting about the use of tf in Similarity -- which i think is
> accurate since nobody replied and said i was wrong...

I will take a closer look at the explaination.


Rajesh Munavalli

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message