Search over indices. Applications usually call {@link org.apache.lucene.search.Searcher#search(Query)} or {@link org.apache.lucene.search.Searcher#search(Query,Filter)}. + + +

Query Classes

+ TermQuery +

Of the various implementations of + Query, the + TermQuery + is the easiest to understand and the most often used in applications. A TermQuery matches all the documents that contain the + specified + Term, + which is a word that occurs in a certain + Field. + Thus, a TermQuery identifies and scores all + Documents that have a Field with the specified string in it. + Constructing a TermQuery + is as simple as: +

+ BooleanQuery +

Things start to get interesting when one combines multiple + TermQuery instances into a BooleanQuery. + A BooleanQuery contains multiple + BooleanClauses, + where each clause contains a sub-query (Query + instance) and an operator (from BooleanClause.Occur) + describing how that sub-query is combined with the other clauses: +

Phrases

Another common search is to find documents containing certain phrases. This + is handled in two different ways. +

+ RangeQuery +

The + RangeQuery + matches all documents that occur in the + exclusive range of a lower + Term + and an upper + Term. + For example, one could find all documents + that have terms beginning with the letters a through c. This type of Query is frequently used to + find + documents that occur in a specific date range. +

+ PrefixQuery, + WildcardQuery +

While the + PrefixQuery + has a different implementation, it is essentially a special case of the + WildcardQuery. + The PrefixQuery allows an application + to identify all documents with terms that begin with a certain string. The WildcardQuery generalizes this by allowing + for the use of * (matches 0 or more characters) and ? (matches exactly one character) wildcards. + Note that the WildcardQuery can be quite slow. Also + note that + WildcardQuery should + not start with * and ?, as these are extremely slow. For tricks on how to search using a wildcard + at + the beginning of a term, see + + Starts With x and Ends With x Queries + from the Lucene users's mailing list. +

+ FuzzyQuery +

A + FuzzyQuery + matches documents that contain terms similar to the specified term. Similarity is + determined using + Levenshtein (edit) distance. + This type of query can be useful when accounting for spelling variations in the collection. +

Changing Similarity

Chances are DefaultSimilarity is sufficient for all + your searching needs. + However, in some applications it may be necessary to customize your Similarity implementation. For instance, some + applications do not need to + distinguish between shorter and longer documents (see a "fair" similarity).

To change Similarity, one must do so for both indexing and + searching, and the changes must happen before + either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it + just isn't well-defined what is going to happen. +

+ If you are interested in use cases for changing your similarity, see the Lucene users's mailing list at Overriding Similarity. + In summary, here are a few use cases: +

Changing Scoring -- Expert Level

Changing scoring is an expert level task, so tread carefully and be prepared to share your code if + you want help. +

With the warning out of the way, it is possible to change a lot more than just the Similarity + when it comes to scoring in Lucene. Lucene's scoring is a complex mechanism that is grounded by + three main classes: +

The Query Class

In some sense, the + Query + class is where it all begins. Without a Query, there would be + nothing to score. Furthermore, the Query class is the catalyst for the other scoring classes as it + is often responsible + for creating them or coordinating the functionality between them. The + Query class has several methods that are important for + derived classes: +

The Weight Interface

The + Weight + interface provides an internal representation of the Query so that it can be reused. Any + Searcher + dependent state should be stored in the Weight implementation, + not in the Query class. The interface defines 6 methods that must be implemented: +

The Scorer Class

The + Scorer + abstract class provides common scoring functionality for all Scorer implementations and + is the heart of the Lucene scoring process. The Scorer defines the following abstract methods which + must be implemented: +

Why would I want to add my own Query?

In a nutshell, you want to add your own custom Query implementation when you think that Lucene's + aren't appropriate for the + task that you want to do. You might be doing some cutting edge research or you need more information + back + out of Lucene (similar to Doug adding SpanQuery functionality).

Examples

Changing scoring is an expert level task, so tread carefully and be prepared to share your code if - you want help. -

With the warning out of the way, it is possible to change a lot more than just the Similarity - when it comes to scoring in Lucene. Lucene's scoring is a complex mechanism that is grounded by - three main classes: -

- Query -- The abstract object representation of the user's information need.
- Weight -- The internal interface representation of the user's Query, so that Query objects may be reused.
- Scorer -- An abstract class containing common functionality for scoring. Provides both scoring and explanation capabilities.

- Details on each of these classes, and their children can be found in the subsections below. +

At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.) To learn more + about how to do this, refer to the + search package javadocs

- -

In some sense, the - Query - class is where it all begins. Without a Query, there would be - nothing to score. Furthermore, the Query class is the catalyst for the other scoring classes as it - is often responsible - for creating them or coordinating the functionality between them. The - Query class has several methods that are important for - derived classes: -

createWeight(Searcher searcher) -- A - Weight is the internal representation of the Query, so each Query implementation must - provide an implementation of Weight. See the subsection on The Weight Interface below for details on implementing the Weight interface.
rewrite(IndexReader reader) -- Rewrites queries into primitive queries. Primitive queries are: - TermQuery, - BooleanQuery, OTHERS????

- - -

The - Weight - interface provides an internal representation of the Query so that it can be reused. Any - Searcher - dependent state should be stored in the Weight implementation, - not in the Query class. The interface defines 6 methods that must be implemented: -

- Weight#getQuery() -- Pointer to the Query that this Weight represents.
- Weight#getValue() -- The weight for this Query. For example, the TermQuery.TermWeight value is - equal to the idf^2 * boost * queryNorm
- - Weight#sumOfSquaredWeights() -- The sum of squared weights. Tor TermQuery, this is (idf * - boost)^2
- - Weight#normalize(float) -- Determine the query normalization factor. The query normalization may - allow for comparing scores between queries.
- - Weight#scorer(IndexReader) -- Construct a new - Scorer - for this Weight. See - The Scorer Class - below for help defining a Scorer. As the name implies, the - Scorer is responsible for doing the actual scoring of documents given the Query. -
- - Weight#explain(IndexReader, int) -- Provide a means for explaining why a given document was scored - the way it was.

- - -

The - Scorer - abstract class provides common scoring functionality for all Scorer implementations and - is the heart of the Lucene scoring process. The Scorer defines the following abstract methods which - must be implemented: -

- Scorer#next() -- Advances to the next document that matches this Query, returning true if and only - if there is another document that matches.
- Scorer#doc() -- Returns the id of the - Document - that contains the match. Is not valid until next() has been called at least once. -
- Scorer#score() -- Return the score of the current document. This value can be determined in any - appropriate way for an application. For instance, the - TermScorer - returns the tf * Weight.getValue() * fieldNorm. -
- Scorer#skipTo(int) -- Skip ahead in the document matches to the document whose id is greater than - or equal to the passed in value. In many instances, skipTo can be - implemented more efficiently than simply looping through all the matching documents until - the target document is identified.
- Scorer#explain(int) -- Provides details on why the score came about.

- - -

In a nutshell, you want to add your own custom Query implementation when you think that Lucene's - aren't appropriate for the - task that you want to do. You might be doing some cutting edge research or you need more information - back - out of Lucene (similar to Doug adding SpanQuery functionality).

- - -

FILL IN HERE

- TermQuery -

- BooleanQuery -

Phrases

- RangeQuery -

- PrefixQuery, - WildcardQuery -

- FuzzyQuery -

Table Of Contents

Search

Query Classes

+ TermQuery +

+ BooleanQuery +

Phrases

+ RangeQuery +

+ PrefixQuery, + WildcardQuery +

+ FuzzyQuery +

Changing Similarity

Changing Scoring -- Expert Level

The Query Class

The Weight Interface

The Scorer Class

Why would I want to add my own Query?

Examples

- TermQuery -

- BooleanQuery -

Phrases

- RangeQuery -

- PrefixQuery, - WildcardQuery -

- FuzzyQuery -