lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: [ANNOUNCE] Apache Lucene 4.0 released.
Date Fri, 12 Oct 2012 08:34:23 GMT
Thanks Robert for doing the hard work of managing this release!

I am happy that the release finally came out, after a long time of development, code refactoring,
and lots of non-finite beer-automatons!

Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Robert Muir []
> Sent: Friday, October 12, 2012 10:10 AM
> To:; Lucene mailing list; java-user; announce
> Subject: [ANNOUNCE] Apache Lucene 4.0 released.
> October 12 2012, Apache Luceneā€š 4.0 available.
> The Lucene PMC is pleased to announce the release of Apache Lucene 4.0
> Apache Lucene is a high-performance, full-featured text search engine library
> written entirely in Java. It is a technology suitable for nearly any application
> that requires full-text search, especially cross-platform.
> This release contains numerous bug fixes, optimizations, and improvements,
> some of which are highlighted below.  The release is available for immediate
> download at:
> See the CHANGES.txt file included with the release for a full list of details.
> Lucene 4.0 Release Highlights:
>  * The index formats for terms, postings lists, stored fields, term vectors, etc
> are pluggable via the Codec api. You can select from the provided
> implementations or customize the index format with your own Codec to meet
> your needs.
>  * Similarity has been decoupled from the vector space model (TF/IDF).
> Additional models such as BM25, Divergence from Randomness, Language
> Models, and Information-based models are provided (see
> 4).
>  * The new doc values feature stores typed values per-document.  It can be
> used for custom scoring factors (accessible via Similarity), for pre-sorted Sort
> values, and more.
>  * IndexWriter now flushes segments to disk concurrently, when the application
> uses multiple threads for indexing, resulting in substantial performance
> improvements (see
> speedup-with-lucenes.html).
>  * Per-document normalization factors ("norms") are no longer limited to a
> single byte. Similarity implementations can use any DocValues type to store
> norms.
>  * New index statistics have been added, including the number of tokens for a
> term or field, number of postings for a field, and number of documents with a
> posting for a field.  These support additional scoring models (see
> 40.html).
>  * A new default term dictionary/index (BlockTree) indexes shared prefixes
> instead of every n'th term. This is not only more time- and
> space- efficient, but can avoid going to disk at all for terms that do not exist in
> certain cases. Alternative term dictionary implementions are provided and
> pluggable via the Codec api.
>  * Indexed terms are no longer limited to UTF-16 char sequences; they can now
> be any binary value encoded as byte arrays. By default, text terms are encoded
> as UTF-8 bytes. Sort order of terms is defined by their binary value, which is
> identical to UTF-8 (Unicode code point) sort order.
>  * Substantially faster performance when using a Filter during searching.
>  * File-system based directories can rate-limit the IO (MB/sec) of merge
> threads, to reduce IO contention between merging and searching threads.
>  * A number of alternative Codecs and components have been added:
> "Appending" works with append-only filesystems (such as Hadoop DFS),
> "Memory" writes the entire terms+postings as an FST read into RAM (see
> with.html),
> "Pulsing" inlines the postings for low-frequency terms into the term dictionary
> (see
> primary-key.html),
> "SimpleText" writes all files in plain-text for easy debugging/transparency (see
> "Bloom" uses a bloom filter to sometimes avoid disk seeks when looking up
> terms, "Direct" holds all postings as simple byte[] and int[] for very fast
> performance at the cost of very high RAM consumption, "Block" use a new
> index layout and compression scheme for improved performance, among
> others.
>  * Term offsets can be optionally encoded into the postings lists and retrieved
> per-position.
>  * A new AutomatonQuery returns all documents containing any term matching
> a provided finite-state automaton (see
> state-queries-in-lucene).
>  * FuzzyQuery is 100-200 times faster than in past releases (see
> faster.html).
>  * A new spell checker, DirectSpellChecker, finds possible corrections directly
> against the main search index without requiring a separate index.
>  * Various in-memory data structures such as the term dictionary and
> FieldCache are represented more efficiently with less object overhead (see
> searching.html).
>  * All search logic is now required to work per segment, IndexReader was
> therefore refactored to differentiate between atomic and composite readers
> (see
>  * Lucene 4.0 provides a modular API, consolidating components such as
> Analyzers and Queries that were previously scattered across Lucene core,
> contrib, and Solr. These modules also include additional functionality such as
> UIMA analyzer integration and a completely reworked spatial search
> implementation.
> Noteworthy changes since 4.0-BETA:
>  * A new "Block" PostingsFormat offering improved search performance and
> index compression. This will likely become the default format in a future
> release. (see
> blockpostingsformat-thanks.html).
>  * All non-default codec implementations were moved to a separated codecs
> module. Just add lucene-codecs-4.0.0.jar to your classpath to test these out.
>  * Payloads can be optionally stored on the term vectors.
>  * Many bugfixes and optimizations.
> Please read CHANGES.txt and MIGRATE.txt for a full list of new features and
> notes on upgrading. Particularly, the new apis are not compatible with previous
> versions of Lucene, however, file format backwards compatibility is provided
> for indexes from the 3.0 series and the 4.0-alpha and -beta releases.
> Please report any feedback to the mailing lists
> (
> Note: The Apache Software Foundation uses an extensive mirroring network for
> distributing releases.  It is possible that the mirror you are using may not have
> replicated the release yet.  If that is the case, please try another mirror.  This
> also goes for Maven access.
> Happy searching,
> Apache Lucene/Solr Developers

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message