Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Thu, 28 Jun 2012 18:20:06 -0000
Message-ID: <20120628182006.52749.25285@eos.apache.org>
Subject: 
 =?utf-8?q?=5BLucene-java_Wiki=5D_Update_of_=22ReleaseNote40alpha=22_by_Mi?=
 =?utf-8?q?keMcCandless?=
Auto-Submitted: auto-generated

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" f=
or change notification.

The "ReleaseNote40alpha" page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/ReleaseNote40alpha?action=3Ddiff&rev1=3D=
1&rev2=3D2

  =

  Lucene 4.0-alpha Release Highlights:
  =

-  * The APIs for accessing terms, postings lists, stored fields, term vect=
ors, etc =

+  * The index formats for terms, postings lists, stored fields, term vecto=
rs, etc =

     are pluggable via the Codec api. You can select from the provided =

     implementations or customize the index format with your own Codec to m=
eet your needs.
  =

@@ -30, +30 @@

     scoring factors (accessible via Similarity), for pre-sorted Sort value=
s, and more.
  =

   * When indexing via multiple threads, each IndexWriter thread now flushe=
s its own segment
-    to disk concurrently.
+    to disk concurrently, resulting in substantial performance improvements
+    (see http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-=
lucenes.html).
  =

   * Per-document normalization factors ("norms") are no longer limited to =
a single byte.
-    Similarity implementations can use any DocValues type to store norms. =

+    Similarity implementations can use any DocValues type to store norms.
  =

   * Added index statistics such as the number of tokens for a term or fiel=
d, number of postings
     for a field, and number of documents with a posting for a field: these=
 support additional
-    scoring models.
+    scoring models (see
+    http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-=
40.html). =

  =

   * Implemented a new default term dictionary/index (BlockTree) that index=
es shared prefixes
-    instead of every n'th term ; this is not only more time- and space- ef=
ficient, but can
+    instead of every n'th term. This is not only more time- and space- eff=
icient, but can
     also sometimes avoid going to disk at all for terms that do not exist.=
 Alternative term
     dictionary implementions are provided and pluggable via the Codec api.
  =

   * Added a number of alternative Codecs and components for different use-=
cases: "Appending"
     works with append-only filesystems (such as Hadoop DFS), "Memory" writ=
es the entire =

-    terms+postings as an FST read into RAM, "Pulsing" inlines the postings=
 for low-frequency =

-    terms into the term dictionary, "SimpleText" writes all files in plain=
-text for easy
-    debugging/transparency, among others.
+    terms+postings as an FST read into RAM (see
+    http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-fas=
ter-with.html),
+    "Pulsing" inlines the postings for low-frequency terms into the term d=
ictionary,
+    "SimpleText" writes all files in plain-text for easy debugging/transpa=
rency, among others.
  =

   * Term offsets can be optionally encoded into the postings lists and can=
 be retrieved
     per-position.
+ =

+  * A new AutomatonQuery returns all documents containing any term matchin=
g a provided
+    finite-state automaton (see http://www.slideshare.net/otisg/finite-sta=
te-queries-in-lucene).
+ =

+  * FuzzyQuery is 100-200 times faster than in past releases (see
+    http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times=
-faster.html); a new
  =

   * Various in-memory data structures such as the term dictionary and Fiel=
dCache are represented
     more efficiently with less object overhead.