lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "ConceptsAndDefinitions" by RenaudWaldura
Date Tue, 03 Jul 2007 21:20:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by RenaudWaldura:
http://wiki.apache.org/lucene-java/ConceptsAndDefinitions

------------------------------------------------------------------------------
  
  This page contains concepts and definitions related to Lucene.  It is not a substitute for
knowledge in InformationRetrieval.
  
- 
- == Concepts ==
- 
- FILL IN HERE:  Basic ideas behind indexing, searching, Lucene in general, important classes,
etc.
- 
  == Definitions ==
  
- '''Please keep in alphabetical order when editing'''
+ ''Please keep in alphabetical order when editing''.
  
  '''[http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html Analyzer]'''
- Lucene class used for preparing text for indexing.  Most applications can use the [http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/StandardAnalyzer.html
StandardAnalyzer] for English and latin based languages.
  
@@ -21, +16 @@

  
  '''Stemmer''' - From [http://en.wikipedia.org/wiki/Stemmer Wikipedia Stemmer]: "A stemming
algorithm, or stemmer, is a computer program or algorithm for reducing inflected (or sometimes
derived) words to their stem, base or root form — generally a written word form."  Stemmers
are often used to reduce the search space and index size.  Often times a user searching for
"widgets" is interested in documents that contain the term "widget".
  
- '''[http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermFreqVector.html TermFreqVector]'''
- A Term Frequency Vector (aka Term Vector) is a data structure containing a given Document's
term and frequency information and can be retrieved from the [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html
IndexReader] only when Term Vectors are stored during indexing.
+ == Core Classes ==
  
+ === Document ===
+ 
+ A Lucene 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Document.html Document]
+ is a record in the index. A Document has a list of fields.
+ 
+ === Term ===
+ 
+ A [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html Term] is Lucene's
unit of indexing. In western languages, a Term is often a word.
+ 
+ === TermEnum ===
+ 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermEnum.html TermEnum]
is used to enumerate all terms in the index for a given field, regardless of which documents
the terms occur in (or where they occur).
+ 
+ Some query subclasses are implemented by enumerating terms that match a pattern, and building
a large OR query from the enumeration. E.g. WildcardQuery, PrefixQuery, RangeQuery.
+ 
+ See ["LuceneFAQ"], ''How do I retrieve all the values of a particular field that exists
within an index, across all documents?'' which also includes sample code.
+ 
+ === TermDocs ===
+ 
+ Unlike TermEnum (see above), [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermDocs.html
TermDocs] is used to identify which documents contain a given Term. TermDocs also gives the
frequency of the term in the document.
+ 
+ === TermFreqVector ===
+ 
+ A [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermFreqVector.html TermFreqVector]
(aka Term Frequency Vector or just Term Vector) is a data structure containing a given Document's
term and frequency information and can be retrieved from the [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html
IndexReader] only when Term Vectors are stored during indexing.
+ 
+ === Directory ===
+ 
+ === IndexReader ===
+ 
+ === IndexSearcher ===
+ 

Mime
View raw message