lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "ReleaseNote40alpha" by MikeMcCandless
Date Thu, 28 Jun 2012 18:20:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "ReleaseNote40alpha" page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/ReleaseNote40alpha?action=diff&rev1=1&rev2=2

  
  Lucene 4.0-alpha Release Highlights:
  
-  * The APIs for accessing terms, postings lists, stored fields, term vectors, etc 
+  * The index formats for terms, postings lists, stored fields, term vectors, etc 
     are pluggable via the Codec api. You can select from the provided 
     implementations or customize the index format with your own Codec to meet your needs.
  
@@ -30, +30 @@

     scoring factors (accessible via Similarity), for pre-sorted Sort values, and more.
  
   * When indexing via multiple threads, each IndexWriter thread now flushes its own segment
-    to disk concurrently.
+    to disk concurrently, resulting in substantial performance improvements
+    (see http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html).
  
   * Per-document normalization factors ("norms") are no longer limited to a single byte.
-    Similarity implementations can use any DocValues type to store norms. 
+    Similarity implementations can use any DocValues type to store norms.
  
   * Added index statistics such as the number of tokens for a term or field, number of postings
     for a field, and number of documents with a posting for a field: these support additional
-    scoring models.
+    scoring models (see
+    http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-40.html). 
  
   * Implemented a new default term dictionary/index (BlockTree) that indexes shared prefixes
-    instead of every n'th term ; this is not only more time- and space- efficient, but can
+    instead of every n'th term. This is not only more time- and space- efficient, but can
     also sometimes avoid going to disk at all for terms that do not exist. Alternative term
     dictionary implementions are provided and pluggable via the Codec api.
  
   * Added a number of alternative Codecs and components for different use-cases: "Appending"
     works with append-only filesystems (such as Hadoop DFS), "Memory" writes the entire 
-    terms+postings as an FST read into RAM, "Pulsing" inlines the postings for low-frequency

-    terms into the term dictionary, "SimpleText" writes all files in plain-text for easy
-    debugging/transparency, among others.
+    terms+postings as an FST read into RAM (see
+    http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-with.html),
+    "Pulsing" inlines the postings for low-frequency terms into the term dictionary,
+    "SimpleText" writes all files in plain-text for easy debugging/transparency, among others.
  
   * Term offsets can be optionally encoded into the postings lists and can be retrieved
     per-position.
+ 
+  * A new AutomatonQuery returns all documents containing any term matching a provided
+    finite-state automaton (see http://www.slideshare.net/otisg/finite-state-queries-in-lucene).
+ 
+  * FuzzyQuery is 100-200 times faster than in past releases (see
+    http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html);
a new
  
   * Various in-memory data structures such as the term dictionary and FieldCache are represented
     more efficiently with less object overhead.

Mime
View raw message