lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <cdor...@gmail.com>
Subject Re: Query to ignore certain phrases
Date Tue, 12 Aug 2008 07:45:09 GMT
I can't see how to accomplish this without writing some special code,
and not just because of query parsing.

Phrases are searched by iterating the participating term
positions and when a match is found say for "b c" there is no way
to know whether another query "a b c d" matches exactly the corresponding
position.

Writing your own query to do this, one straightforward way is for its scorer

to maintain two exact phrase sub-scorers, but, (1) use for the score just
the
phraseFreq() found by the scorer of the inner phrase ("b c"), and (2)
if both phraseFreq() are the same - that of the inner phrase and that
of the outer phrase ("a b c d") ignore the document all together.

If this is where you're heading, start in the package javadocs for the
search package,
and see the code of PhraseScorer.

Span queries are another direction I would check for this, perhaps a higher
level
code can suffice with spans, but I don't know for sure.

Doron

On Tue, Aug 12, 2008 at 2:18 AM, Jeff French <jeff@mdbconsulting.com> wrote:

>
> We're trying to perform a query where if our intended search term/phrase is
> part of a specific larger phrase, we want to ignore that particular match,
> but not the entire document (unless of course there are no other hits with
> our intended term/phrase). For example, a query like:
>
>    "white house" UNLESS "russian white house"
>
> should not produce a match on the phrase:
>
>    "russian white house"
>
> but should match:
>
>    "white house"
>
> Where this differs from the NOT operator is that we don't want to rule out
> a
> document just because it contains "russian white house", we just want to
> ignore the hit, so that this phrase:
>
>    "... in the russian white house as opposed to the american white house
> ..."
>
> would return the document.
>
> Can this be accomplished using Lucene or Qsol QueryParser syntax, or do we
> need to write something custom?
> --
> View this message in context:
> http://www.nabble.com/Query-to-ignore-certain-phrases-tp18935560p18935560.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message