lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lucene-...@jakarta.apache.org
Subject [Jakarta Lucene Wiki] Updated: LuceneFAQ
Date Mon, 03 Jan 2005 21:38:31 GMT
   Date: 2005-01-03T13:38:31
   Editor: DanielNaber
   Wiki: Jakarta Lucene Wiki
   Page: LuceneFAQ
   URL: http://wiki.apache.org/jakarta-lucene/LuceneFAQ

   avoid useless links

Change Log:

------------------------------------------------------------------------------
@@ -4,7 +4,7 @@
 
 [[TableOfContents]]
 
-== FAQ ==
+== Lucene FAQ ==
 
 === General ===
 
@@ -71,7 +71,7 @@
 
 === Searching ===
 
-==== Why am i getting no hits / incorrect hits? ====
+==== Why am I getting no hits / incorrect hits? ====
 
 Some possible causes:
 
@@ -79,10 +79,10 @@
  * The term is in a field that was not tokenized during indexing and therefore, the entire
content of the field was considered as a single term. Re-index the documents and make sure
the field is tokenized. 
  * The field specified in the query simply does not exist. You won't get an error message
in this case, you'll just get no matches.
  * The field specified in the query has wrong case. Field names are case sensitive.
- * The term you are searching is a stop word that was dropped by the analyzer you use. For
example, if your analyzer uses the StopFilter, a search for the word 'the' will always fail
(i.e. produce no hits).
+ * The term you are searching is a stop word that was dropped by the analyzer you use. For
example, if your analyzer uses the !StopFilter, a search for the word 'the' will always fail
(i.e. produce no hits).
  * You are using different analyzers (or the same analyzer but with different stop words)
for indexing and searching and as a result, the same term is transformed differently during
indexing and searching.
- * The analyzer you are using is case sensitive (e.g. it does not use the LowerCaseFilter)
and the term in the query has different case than the term in the document. 
- * The documents you are indexing are very large. Lucene by default only indexes the first
10,000 terms of a document to avoid OutOfMemory errors. See [http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
IndexWriter.maxFieldLength].
+ * The analyzer you are using is case sensitive (e.g. it does not use the !LowerCaseFilter)
and the term in the query has different case than the term in the document. 
+ * The documents you are indexing are very large. Lucene by default only indexes the first
10,000 terms of a document to avoid !OutOfMemory errors. See [http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
IndexWriter.maxFieldLength].
  
 If none of the possible causes above apply to your case, this will help you to debug the
problem:
 
@@ -117,7 +117,7 @@
 
 Another wild card character that you can use is '?', a question mark.  The ? will match a
single character.  This allows you to perform queries such as ''Bra?il''. Such a query will
match both ''Brasil'' and ''Brazil''.  Lucene refers to this type of a query as a 'wildcard
query'.
 
-'''Note''': Leading wildcards (e.g. ''*ook'') are '''not''' supported by the QueryParser.
+'''Note''': Leading wildcards (e.g. ''*ook'') are '''not''' supported by the !QueryParser.
 
 
 ==== Is the QueryParser thread-safe? ====
@@ -171,7 +171,7 @@
 By default, `slop` is set to 0 so that only exact phrases will match.
 However, you can alter the value using the `setSlop(int)` method.
 
-When using QueryParser you can use this syntax to specify the slop: "doug cutting"~2 will
find documents that contain "doug cutting" as well as ones that contain "cutting doug".
+When using !QueryParser you can use this syntax to specify the slop: "doug cutting"~2 will
find documents that contain "doug cutting" as well as ones that contain "cutting doug".
 
 
 ==== Are Wildcard, Prefix, and Fuzzy queries case sensitive? ====
@@ -209,7 +209,7 @@
 
 ==== Is the IndexSearcher thread-safe? ====
 
-'''Yes''', IndexSearcher is thread-safe.  Multiple search threads may access the index concurrently
without any problems.
+Yes, !IndexSearcher is thread-safe.  Multiple search threads may access the index concurrently
without any problems.
 
 
 ==== Is there a way to retrieve the original term positions during the search? ====
@@ -272,12 +272,12 @@
 
 ==== How do I perform a simple indexing of a set of documents? ====
 
-The easiest way is to re-index the entire document set periodically or whenever it changes.
All you need to do is to create an instance of IndexWriter(), iterate over your document set,
create for each document a Lucene Document object and add it to the IndexWriter. When you
are done make sure to close the IndexWriter. This will release all of its resources and will
close the files it created. 
+The easiest way is to re-index the entire document set periodically or whenever it changes.
All you need to do is to create an instance of !IndexWriter(), iterate over your document
set, create for each document a Lucene Document object and add it to the !IndexWriter. When
you are done make sure to close the !IndexWriter. This will release all of its resources and
will close the files it created. 
 
 
 ==== How can I add document(s) to the index? ====
 
-Simply create an IndexWriter and use its addDocument() method. Make sure to create the IndexWriter
with the 'create' flag set to false and make sure to close the IndexWriter when you are done
adding the documents.
+Simply create an !IndexWriter and use its addDocument() method. Make sure to create the !IndexWriter
with the 'create' flag set to false and make sure to close the !IndexWriter when you are done
adding the documents.
 
 
 ==== Where does Lucene store the index it builds? ====
@@ -345,7 +345,7 @@
 
 ==== What is index optimization and when should I use it? ====
 
-The IndexWriter class supports an optimize() method that compacts the index database and
speedup queries. You may want to use this method after performing a complete indexing of your
document set or after incremental updates of the index. If your incremental update adds documents
frequently, you want to perform the optimization only once in a while to avoid the extra overhead
of the optimization.
+The !IndexWriter class supports an optimize() method that compacts the index database and
speedup queries. You may want to use this method after performing a complete indexing of your
document set or after incremental updates of the index. If your incremental update adds documents
frequently, you want to perform the optimization only once in a while to avoid the extra overhead
of the optimization.
 
 ==== What are Segments? ====
 
@@ -384,16 +384,16 @@
 The write.lock is used to keep processes from concurrently attempting
 to modify an index. 
 
-It is obtained by an `IndexWriter` while it is open, and by an `IndexReader` once documents
have been deleted and until it is closed.
+It is obtained by an !IndexWriter while it is open, and by an !IndexReader once documents
have been deleted and until it is closed.
 
 
 ==== What is the purpose of the commit.lock file, when is it used, and by which classes?
====
 
 The commit.lock file is used to coordinate the contents of the 'segments'
-file with the files in the index.  It is obtained by an `IndexReader` before it reads the
'segments' file, which names all of the other files in the
-index, and until the `IndexReader` has opened all of these other files.
+file with the files in the index.  It is obtained by an !IndexReader before it reads the
'segments' file, which names all of the other files in the
+index, and until the !IndexReader has opened all of these other files.
 
-The commit.lock is also obtained by the `IndexWriter` when it is about to write the segments
file and until it has finished trying to delete obsolete index files.
+The commit.lock is also obtained by the !IndexWriter when it is about to write the segments
file and until it has finished trying to delete obsolete index files.
 
 The commit.lock should thus never be held for long, since while
 it is obtained files are only opened or deleted, and one small file is
@@ -484,7 +484,7 @@
 and content.xml to get the document's content. Add these to the Lucene index,
 typically using one Lucene field per property.
 
-Note that this applies to OpenOffice.org 1.x, things might change a bit for OpenOffice.org
+Note that this applies to !OpenOffice.org 1.x, things might change a bit for !OpenOffice.org
 2.x, but the basic approach will still be the same.
 
 
@@ -545,9 +545,9 @@
 
 ==== What is the difference between IndexWriter.addIndexes(IndexReader[]) and IndexWriter.addIndexes(Directory[]),
besides them taking different arguments? ====
 
-When merging lots of indexes (more than the mergeFactor), the Directory-based method will
use fewer file handles and less memory, as it will only ever open mergeFactor indexes at once,
while the IndexReader-based method requires that all indexes be open when passed.
+When merging lots of indexes (more than the mergeFactor), the Directory-based method will
use fewer file handles and less memory, as it will only ever open mergeFactor indexes at once,
while the !IndexReader-based method requires that all indexes be open when passed.
 
-The primary advantage of the IndexReader-based method is that one can pass it IndexReaders
that don't reside in a Directory.
+The primary advantage of the !IndexReader-based method is that one can pass it !IndexReaders
that don't reside in a Directory.
 
 
 ==== Can I use Lucene to index text in Chinese, Japanese, Korean, and other multi-byte character
sets? ====

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message