Return-Path: X-Original-To: apmail-lucene-java-commits-archive@www.apache.org Delivered-To: apmail-lucene-java-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 47277D64C for ; Thu, 28 Jun 2012 18:20:30 +0000 (UTC) Received: (qmail 89436 invoked by uid 500); 28 Jun 2012 18:20:30 -0000 Delivered-To: apmail-lucene-java-commits-archive@lucene.apache.org Received: (qmail 89335 invoked by uid 500); 28 Jun 2012 18:20:29 -0000 Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-commits@lucene.apache.org Received: (qmail 89328 invoked by uid 99); 28 Jun 2012 18:20:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 18:20:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 18:20:27 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 80E79F0; Thu, 28 Jun 2012 18:20:07 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Thu, 28 Jun 2012 18:20:06 -0000 Message-ID: <20120628182006.52749.25285@eos.apache.org> Subject: =?utf-8?q?=5BLucene-java_Wiki=5D_Update_of_=22ReleaseNote40alpha=22_by_Mi?= =?utf-8?q?keMcCandless?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" f= or change notification. The "ReleaseNote40alpha" page has been changed by MikeMcCandless: http://wiki.apache.org/lucene-java/ReleaseNote40alpha?action=3Ddiff&rev1=3D= 1&rev2=3D2 = Lucene 4.0-alpha Release Highlights: = - * The APIs for accessing terms, postings lists, stored fields, term vect= ors, etc = + * The index formats for terms, postings lists, stored fields, term vecto= rs, etc = are pluggable via the Codec api. You can select from the provided = implementations or customize the index format with your own Codec to m= eet your needs. = @@ -30, +30 @@ scoring factors (accessible via Similarity), for pre-sorted Sort value= s, and more. = * When indexing via multiple threads, each IndexWriter thread now flushe= s its own segment - to disk concurrently. + to disk concurrently, resulting in substantial performance improvements + (see http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-= lucenes.html). = * Per-document normalization factors ("norms") are no longer limited to = a single byte. - Similarity implementations can use any DocValues type to store norms. = + Similarity implementations can use any DocValues type to store norms. = * Added index statistics such as the number of tokens for a term or fiel= d, number of postings for a field, and number of documents with a posting for a field: these= support additional - scoring models. + scoring models (see + http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-= 40.html). = = * Implemented a new default term dictionary/index (BlockTree) that index= es shared prefixes - instead of every n'th term ; this is not only more time- and space- ef= ficient, but can + instead of every n'th term. This is not only more time- and space- eff= icient, but can also sometimes avoid going to disk at all for terms that do not exist.= Alternative term dictionary implementions are provided and pluggable via the Codec api. = * Added a number of alternative Codecs and components for different use-= cases: "Appending" works with append-only filesystems (such as Hadoop DFS), "Memory" writ= es the entire = - terms+postings as an FST read into RAM, "Pulsing" inlines the postings= for low-frequency = - terms into the term dictionary, "SimpleText" writes all files in plain= -text for easy - debugging/transparency, among others. + terms+postings as an FST read into RAM (see + http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-fas= ter-with.html), + "Pulsing" inlines the postings for low-frequency terms into the term d= ictionary, + "SimpleText" writes all files in plain-text for easy debugging/transpa= rency, among others. = * Term offsets can be optionally encoded into the postings lists and can= be retrieved per-position. + = + * A new AutomatonQuery returns all documents containing any term matchin= g a provided + finite-state automaton (see http://www.slideshare.net/otisg/finite-sta= te-queries-in-lucene). + = + * FuzzyQuery is 100-200 times faster than in past releases (see + http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times= -faster.html); a new = * Various in-memory data structures such as the term dictionary and Fiel= dCache are represented more efficiently with less object overhead.