lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "LuceneCaveats" by RenaudWaldura
Date Fri, 29 Jun 2007 22:23:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by RenaudWaldura:
http://wiki.apache.org/lucene-java/LuceneCaveats

The comment on the change is:
maxFieldLength rule of thumb

------------------------------------------------------------------------------
  === Luke is your friend ===
  
  Luke is an invaluable tool to learn what actually went in your index.
- If it's not in the index, you can't query it. 
+ If it's not in the index, you can't query it. [http://www.getopt.org/luke/ Luke]
  
  === Use the same analyzer for indexing and querying ===
  
@@ -29, +29 @@

  The indexer by default truncates documents to {{{IndexWriter.DEFAULT_MAX_FIELD_LENGTH}}}
  or 10,000 terms in Lucene 2.0. 
  
- Rule of thumb: an average page of English text contains about 250 words. (Source: [http://answers.google.com/answers/threadview?id=608972
Google Answers].) This means only about 40 pages are indexed by default. If any of your documents
are longer than this (and you want them indexed), you should raise the limit with [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxFieldLength(int)
IndexWriter.setMaxFieldLength()].
+ Rule of thumb: an average page of English text contains about 250 words. (Source: [http://answers.google.com/answers/threadview?id=608972
Google Answers].) This means only about 40 pages are indexed by default. If any of your documents
are longer than this (and you want them indexed in full), you should raise the limit with
[http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxFieldLength(int)
IndexWriter.setMaxFieldLength()].
  
  === Stopwords are removed ===
  
@@ -48, +48 @@

  Especially in a Web application, where keeping state between requests 
  requires additional work, it is tempting to open and close the index 
  on every request. Unfortunately, this leads to very poor performance.
- At first this might work with small indexes or beefy hardware, but you
+ At first this might work with small indexes or beefy hardware, but 
- will soon run into performance problems -- e.g. large garbage collections.
+ performance problems soon crop up -- e.g. large garbage collections.
  
  You should keep the index open as long as possible. Both 
  [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html IndexReader]
@@ -64, +64 @@

  === No need to cache search results ===
  
  Lucene is amazingly fast at searching. Rather than caching hits
- and paging through them, merely re-executing the query is almost always 
+ and paging through them, merely re-executing the query is often 
  fast enough. 
  
  See: ["LuceneFAQ"], "How do I implement paging, i.e. showing result from 1-10, 11-20 etc?".
@@ -93, +93 @@

  
  Parsing free text is a surprisingly hard problem at which 
  [http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html QueryParser]

- does a pretty good job. Rather than editing the query string, change the Query 
+ does a pretty good job. Rather than editing the query string, change the 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Query.html Query] 
  objects returned by the
  [http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html QueryParser].
  
@@ -105, +106 @@

  
  === Lucene is not a true boolean system ===
  
- Or: {{{apple AND banana OR orange}}} doesn't work. 
+ Or: {{{apple AND banana OR orange}}} doesn't work. Surprising at first.
  [http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html QueryParser]
- does its best to 
- translate from a boolean syntax to Lucene's own set-oriented queries, but
+ does its best to translate from a boolean syntax to Lucene's own set-oriented queries, but
- it falls short. Either use parens everywhere or try to design your user
+ it falls short. Either use parentheses everywhere or try to design your user
  interface accordingly.
  
  See BooleanQuerySyntax.

Mime
View raw message