lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "LuceneCaveats" by RenaudWaldura
Date Fri, 29 Jun 2007 22:17:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by RenaudWaldura:
http://wiki.apache.org/lucene-java/LuceneCaveats

------------------------------------------------------------------------------
  Dissimilar or incompatible analyzers lead to mysterious search
  behavior. See: ["LuceneFAQ"], "Why am I getting no hits / incorrect hits?".
  
- === Large documents are truncated by default ===
+ === Documents are truncated by default ===
  
- The indexer will be default truncate documents to {{{IndexWriter.DEFAULT_MAX_FIELD_LENGTH}}}
+ The indexer by default truncates documents to {{{IndexWriter.DEFAULT_MAX_FIELD_LENGTH}}}
- or 10,000 terms in Lucene 2.0. This limit 
+ or 10,000 terms in Lucene 2.0. 
- can easily be changed with 
- [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxFieldLength(int)
IndexWriter.setMaxFieldLength()].
+ 
+ Rule of thumb: an average page of English text contains about 250 words. (Source: [http://answers.google.com/answers/threadview?id=608972
Google Answers].) This means only about 40 pages are indexed by default. If any of your documents
are longer than this (and you want them indexed), you should raise the limit with [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxFieldLength(int)
IndexWriter.setMaxFieldLength()].
  
  === Stopwords are removed ===
  

Mime
View raw message