lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "LuceneCaveats" by RenaudWaldura
Date Fri, 29 Jun 2007 21:54:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by RenaudWaldura:
http://wiki.apache.org/lucene-java/LuceneCaveats

The comment on the change is:
API links

------------------------------------------------------------------------------
  
  === Use the same analyzer for indexing and querying ===
  
- Make sure you use the same Analyzer class when building your index
+ Make sure you use the same 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html Analyzer]

+ class when building your index
  and later querying against that index. Analysis dictates what goes
- into your index and how. The QueryParser needs this information 
+ into your index and how. The 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html QueryParser]
+ needs this information 
  to generate the proper queries. 
  
  Dissimilar or incompatible analyzers lead to mysterious search
@@ -22, +26 @@

  
  === Large documents are truncated by default ===
  
- The indexer will be default truncate documents to IndexWriter.DEFAULT_MAX_FIELD_LENGTH 
+ The indexer will be default truncate documents to {{{IndexWriter.DEFAULT_MAX_FIELD_LENGTH}}}
  or 10,000 terms in Lucene 2.0. This limit 
- can easily be changed with IndexWriter.setMaxFieldLength().
+ can easily be changed with 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxFieldLength(int)
IndexWriter.setMaxFieldLength()].
  
  === Stopwords are removed ===
  
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/standard/StandardAnalyzer.html
StandardAnalyzer]
- StandardAnalyzer (the most commonly recommended analyzer) does not index
+ (the most commonly recommended analyzer) does not index
  "stopwords". Stopwords are common English words such "the", "a", etc. -- the default list
is
- StopAnalyzer.ENGLISH_STOP_WORDS. These words are completely ignored and 
+ {{{StopAnalyzer.ENGLISH_STOP_WORDS}}}. These words are completely ignored and 
  cannot be searched for, at all. This means that even phrase queries aren't 
  exact. E.g. the phrase query "to be or not to be" finds nothing at all.
  
@@ -45, +51 @@

  At first this might work with small indexes or beefy hardware, but you
  will soon run into performance problems -- e.g. large garbage collections.
  
- You should keep the index open as long as possible. Both IndexReader
+ You should keep the index open as long as possible. Both 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html IndexReader]
+ and
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexSearcher.html IndexSearcher]
- and IndexSearcher are thread-safe and don't require additional
+ are thread-safe and don't require additional
  synchronization. One could cache the index searcher e.g. in the application 
  context. 
  
@@ -70, +79 @@

  
  === Use RangeFilter instead of RangeQuery ===
  
- RangeQuery expands every term in the range to a boolean expression,
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/search/RangeQuery.html RangeQuery]
expands every term in the range to a boolean expression,
- and easily blows past the built-in BooleanQuery.maxClauseCount limit
+ and easily blows past the built-in {{{BooleanQuery.maxClauseCount}}} limit
  (Lucene 2.0 defaults to about 1000).
  
- RangeFilter doesn't suffer from this limitation.
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/search/RangeFilter.html RangeFilter]
doesn't suffer from this limitation.
  
  See: LuceneFAQ, "Why am I getting a TooManyClauses exception?".
  
@@ -82, +91 @@

  
  === Edit the query rather than the string ===
  
- Parsing free text is a surprisingly hard problem at which QueryParser does a
+ Parsing free text is a surprisingly hard problem at which 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html QueryParser]

- pretty good job. Rather than editing the query string, change the Query 
+ does a pretty good job. Rather than editing the query string, change the Query 
- objects returned by the QueryParser.
+ objects returned by the
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html QueryParser].
  
  E.g. when offering additional search options in a Web form, it's easier and safer
- to combine them with the parsed Query object rather doing text manipulations on 
+ to combine them with the parsed
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Query.html Query]
+ object rather doing text manipulations on 
  the query string.
  
  === Lucene is not a true boolean system ===
  
- Or: "apple AND banana OR orange" doesn't work. QueryParser does its best to 
+ Or: {{{apple AND banana OR orange}}} doesn't work. 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html QueryParser]
+ does its best to 
  translate from a boolean syntax to Lucene's own set-oriented queries, but
  it falls short. Either use parens everywhere or try to design your user
  interface accordingly.
@@ -101, +116 @@

  
  === Iterating over all hits takes a long time ===
  
- This is by design. Try using a HitsCollector instead if you need access to
+ This is by design. Try using a 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/search/HitsCollector.html HitsCollector]
+ instead if you need access to
  all the hits for a search.
  
  === Highlighting search results ===

Mime
View raw message