lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Lucene retrieval model
Date Tue, 30 Dec 2008 11:09:17 GMT
Op Tuesday 30 December 2008 10:03:03 schreef Claudia Santos:
> Hello,
>
> I would like to know more about Lucene's retrieval model, more
> specifically about the boolean model.
> Is that a standard model or an extended model? I mean, it returns
> just documents that match the boolean expression or include in the
> search result all Documents which correspond to the given conditions,
> regardless of the boolean connectors - AND, OR, NOT and calculate a
> weight between 0 and 1 for all search results that contains at least
> one of the terms. The extended model evaluates documents with only
> one of the terms with a smaller value than one that contains both.
>
> In the Apache Lucene - Scoring's page i found not that much about:
> "Lucene scoring uses a combination of the Vector Space Model (VSM) of
> Information Retrieval and the Boolean model to determine how relevant
> a given Document is to a User's query. In general, the idea behind
> the VSM is the more times a query term appears in a document relative
> to the number of times the term appears in all the documents in the
> collection, the more relevant that document is to the query. It uses
> the Boolean model to first narrow down the documents that need to be
> scored based on the use of boolean logic in the Query specification.
> Lucene also adds some capabilities and refinements onto this model to
> support boolean and fuzzy searching, but it essentially remains a VSM
> based system at the heart."
>

A somewhat refined Boolean model is used to determine a set of
documents, and only for documents in that set a score value
is calculated according the Lucene VSM model.

The Boolean model in Lucene does not directly use the standard
boolean connectors. Instead of that, each clause
(term, subquery) is either required, optional or prohibited.
The required and prohibited clauses determine a set of
documents to be scored in the normal Boolean AND/NOT way.

The refinement in the Boolean model is for the optional clauses:
a minimum number of optional clauses may be required for
documents to be part of the set that is scored.
The normal Boolean OR operator has 1 as that minimum number,
and in Lucene this minimum defaults to 1 when no required clauses
are present.

The required clauses and the optional clauses contribute to the score.
One might consider the scoring of the optional clauses to be an
implementation of the extended Boolean model.

Fuzzy searching is implemented by constructing a Boolean query
with optional (and actually present) terms that are similar enough to
the fuzzy query term.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message