lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "ReleaseNote40" by MikeMcCandless
Date Thu, 27 Sep 2012 11:00:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "ReleaseNote40" page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/ReleaseNote40?action=diff&rev1=3&rev2=4

     such as BM25, Divergence from Randomness, Language Models, and Information-based models
     are provided (see http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4).
  
-  * Added support for per-document values (DocValues). DocValues can be used for custom 
+  * The new doc values feature stores typed values per-document.  It
+    can be used for custom scoring factors (accessible via
-    scoring factors (accessible via Similarity), for pre-sorted Sort values, and more.
+    Similarity), for pre-sorted Sort values, and more. 
  
-  * When indexing via multiple threads, each IndexWriter thread now flushes its own segment
-    to disk concurrently, resulting in substantial performance improvements
+  * IndexWriter now flushes segments to disk concurrently, when the
+    application uses multiple threads for indexing, resulting in substantial performance
improvements
     (see http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html).
  
   * Per-document normalization factors ("norms") are no longer limited to a single byte.
     Similarity implementations can use any DocValues type to store norms.
  
-  * Added index statistics such as the number of tokens for a term or field, number of postings
+  * New index statistics have been added, including the number of tokens for a term or field,
number of postings
-    for a field, and number of documents with a posting for a field: these support additional
+    for a field, and number of documents with a posting for a field.  These support additional
     scoring models (see
     http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-40.html). 
  
-  * Implemented a new default term dictionary/index (BlockTree) that indexes shared prefixes
+  * A new default term dictionary/index (BlockTree) indexes shared prefixes
     instead of every n'th term. This is not only more time- and space- efficient, but can
-    also sometimes avoid going to disk at all for terms that do not exist. Alternative term
+    avoid going to disk at all for terms that do not exist in certain cases. Alternative
term
     dictionary implementions are provided and pluggable via the Codec api.
  
-  * Indexed terms are no longer UTF-16 char sequences, instead terms can be any binary
+  * Indexed terms are no longer limited to UTF-16 char sequences; they can now be any binary
-    value encoded as byte arrays. By default, text terms are now encoded as UTF-8
+    value encoded as byte arrays. By default, text terms are encoded as UTF-8
-    bytes. Sort order of terms is now defined by their binary value, which is identical
+    bytes. Sort order of terms is defined by their binary value, which is identical
-    to UTF-8 sort order.
+    to UTF-8 (Unicode code point) sort order.
  
   * Substantially faster performance when using a Filter during searching.
  
   * File-system based directories can rate-limit the IO (MB/sec) of merge
     threads, to reduce IO contention between merging and searching threads.
  
-  * Added a number of alternative Codecs and components for different use-cases: "Appending"
+  * A number of alternative Codecs and components have been added: "Appending"
     works with append-only filesystems (such as Hadoop DFS), "Memory" writes the entire 
     terms+postings as an FST read into RAM (see
     http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-with.html),
@@ -69, +70 @@

     cost of very high RAM consumption, "Block" use a new index layout and compression scheme
for 
     improved performance, among others.
  
-  * Term offsets can be optionally encoded into the postings lists and can be retrieved
+  * Term offsets can be optionally encoded into the postings lists and retrieved
     per-position.
  
   * A new AutomatonQuery returns all documents containing any term matching a provided

Mime
View raw message