jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Search" by MarcelReutegger
Date Tue, 15 May 2007 12:39:50 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by MarcelReutegger:
http://wiki.apache.org/jackrabbit/Search

New page:
The search index in Jackrabbit is pluggable and has a default implementation based on Apache
Lucene. This default implementation has the following options:

|| '''Parameter''' || '''Default Value''' || '''Description''' || '''Since''' ||
|| path || ''none'' || The location of the index directory. This parameter is mandatory. A
reasonable value is: {{{${wsp.home}}}} || 1.0 ||
|| useCompoundFile || true || Advises lucene to use compound files for the index files. ||
1.0 ||
|| minMergeDocs || 100 || Minimum number of nodes in an index until segments are merged ||
1.0 ||
|| volatileIdleTime || 3 || Idle time in seconds until the volatile index part is moved to
a persistent index even though minMergeDocs is not reached. || 1.0 ||
|| maxMergeDocs || 100000 || Maximum number of nodes in segments that will be merged. || 1.0
||
|| mergeFactor || 10 || Determines how often segment indices are merged. || 1.0 ||
|| maxFieldLength || 10000 || The number of words that are fulltext indexed at most per property.
|| 1.1 ||
|| bufferSize || 10 || Maximum number of documents that are held in a pending queue until
added to the index || 1.0 ||
|| cacheSize || 1000 || Size of the document number cache. This cache maps uuids to lucene
document numbers || 1.0 ||
|| forceConsistencyCheck || false || Runs a consistency check on every startup. If false,
a consistency check is only performed when the search index detects a prior forced shutdown.
|| 1.0 ||
|| autoRepair || true || Errors detected by a consistency check are automatically repaired.
If false, errors are only written to the log. || 1.0 ||
|| analyzer || {{{org.apache.lucene.analysis.standard.StandardAnalyzer}}} || Class name of
a lucene analyzer to use for fulltext indexing of text. || 1.0 ||
|| queryClass || {{{org.apache.jackrabbit.core.query.QueryImpl}}} || Class name that implements
the {{{javax.jcr.query.Query}}} interface. This class must also extend from the class: {{{org.apache.jackrabbit.core.query.AbstractQueryImpl}}}
|| 1.0 ||
|| respectDocumentOrder || true || If true and the query does not contain an 'order by' clause,
result nodes will be in document order. For better performance when queries return a lot of
nodes set to 'false'. || 1.0 ||
|| textFilterClasses || {{{org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter}}}
|| Sets the list of text filters (and text extractors) to use for extracting text content
from binary properties. The list must be comma (or whitespace) separated, and contain fully
qualified class names of the {{{TextFilter}}} (and since 1.3 {{{TextExtractor}}} ) classes
to be used. The configured classes must all have a public default constructor. || 1.0 ||
|| resultFetchSize || 2147483647 || The number of results the query handler should initially
fetch when a query is executed. Default value: Integer.MAX_VALUE (-> all) || 1.2.1 ||
|| extractorPoolSize || 0 || Defines the maximum number of background threads that are used
to extract text from binary properties. If set to zero (default) no background threads are
allocated and text extractors run in the current thread. || 1.3 ||
|| extractorTimeout || 100 || A text extractor is executed using a background thread if it
doesn't finish within this timeout defined in milliseconds. This parameter has no effect if
extractorPoolSize is zero. || 1.3 ||
|| extractorBackLogSize || 100 || The size of the extractor pool back log. If all threads
in the pool are busy, incomming work is put into a wait queue. If the wait queue reaches the
back log size, incomming extractor work will not be queued anymore but will be executed with
the current thread. || 1.3 ||
|| excerptProviderClass || 1.3: {{{org.apache.jackrabbit.core.query.lucene.DefaultXMLExcerpt}}},
>=1.4: {{{org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt}}} || The name of
the class that implements {{{org.apache.jackrabbit.core.query.lucene.ExcerptProvider}}} and
should be used for the rep:excerpt() function in a query. || 1.3 ||
|| supportHighlighting || false || If set to {{{true}}} additional information is stored in
the index to support highlighting using the rep:excerpt() function. || 1.3 ||
|| synonymProviderClass || ''none'' || The name of a class that implements {{{org.apache.jackrabbit.core.query.lucene.SynonymProvider}}}.
The default value is null (-> not set). || 1.4 ||

'''Note''': all parameters (except path) have default values and can be omitted to use the
default.


== Proprietary Features ==

Jackrabbit supports some advanced features, which are not specified in JSR 170:

 * Get a text excerpt with highlighted words that matched the query: ["ExcerptProvider"]
 * Search for a term and its synonyms: ["SynonymSearch"]
 * Search for similar nodes: ["SimilaritySearch"]

Mime
View raw message