lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [ANNOUNCE] Apache Lucene 4.0 released.
Date Fri, 12 Oct 2012 08:34:23 GMT
Thanks Robert for doing the hard work of managing this release!

I am happy that the release finally came out, after a long time of development, code refactoring,
and lots of non-finite beer-automatons!

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Robert Muir [mailto:rmuir@apache.org]
> Sent: Friday, October 12, 2012 10:10 AM
> To: dev@lucene.apache.org; Lucene mailing list; java-user; announce
> Subject: [ANNOUNCE] Apache Lucene 4.0 released.
> 
> October 12 2012, Apache Luceneā€š 4.0 available.
> The Lucene PMC is pleased to announce the release of Apache Lucene 4.0
> 
> Apache Lucene is a high-performance, full-featured text search engine library
> written entirely in Java. It is a technology suitable for nearly any application
> that requires full-text search, especially cross-platform.
> 
> This release contains numerous bug fixes, optimizations, and improvements,
> some of which are highlighted below.  The release is available for immediate
> download at:
>    http://lucene.apache.org/core/mirrors-core-latest-redir.html
> 
> See the CHANGES.txt file included with the release for a full list of details.
> 
> Lucene 4.0 Release Highlights:
> 
>  * The index formats for terms, postings lists, stored fields, term vectors, etc
> are pluggable via the Codec api. You can select from the provided
> implementations or customize the index format with your own Codec to meet
> your needs.
> 
>  * Similarity has been decoupled from the vector space model (TF/IDF).
> Additional models such as BM25, Divergence from Randomness, Language
> Models, and Information-based models are provided (see
> http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-
> 4).
> 
>  * The new doc values feature stores typed values per-document.  It can be
> used for custom scoring factors (accessible via Similarity), for pre-sorted Sort
> values, and more.
> 
>  * IndexWriter now flushes segments to disk concurrently, when the application
> uses multiple threads for indexing, resulting in substantial performance
> improvements (see http://blog.mikemccandless.com/2011/05/265-indexing-
> speedup-with-lucenes.html).
> 
>  * Per-document normalization factors ("norms") are no longer limited to a
> single byte. Similarity implementations can use any DocValues type to store
> norms.
> 
>  * New index statistics have been added, including the number of tokens for a
> term or field, number of postings for a field, and number of documents with a
> posting for a field.  These support additional scoring models (see
> http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-
> 40.html).
> 
>  * A new default term dictionary/index (BlockTree) indexes shared prefixes
> instead of every n'th term. This is not only more time- and
> space- efficient, but can avoid going to disk at all for terms that do not exist in
> certain cases. Alternative term dictionary implementions are provided and
> pluggable via the Codec api.
> 
>  * Indexed terms are no longer limited to UTF-16 char sequences; they can now
> be any binary value encoded as byte arrays. By default, text terms are encoded
> as UTF-8 bytes. Sort order of terms is defined by their binary value, which is
> identical to UTF-8 (Unicode code point) sort order.
> 
>  * Substantially faster performance when using a Filter during searching.
> 
>  * File-system based directories can rate-limit the IO (MB/sec) of merge
> threads, to reduce IO contention between merging and searching threads.
> 
>  * A number of alternative Codecs and components have been added:
> "Appending" works with append-only filesystems (such as Hadoop DFS),
> "Memory" writes the entire terms+postings as an FST read into RAM (see
> http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-
> with.html),
> "Pulsing" inlines the postings for low-frequency terms into the term dictionary
> (see http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-
> primary-key.html),
> "SimpleText" writes all files in plain-text for easy debugging/transparency (see
> http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html),
> "Bloom" uses a bloom filter to sometimes avoid disk seeks when looking up
> terms, "Direct" holds all postings as simple byte[] and int[] for very fast
> performance at the cost of very high RAM consumption, "Block" use a new
> index layout and compression scheme for improved performance, among
> others.
> 
>  * Term offsets can be optionally encoded into the postings lists and retrieved
> per-position.
> 
>  * A new AutomatonQuery returns all documents containing any term matching
> a provided finite-state automaton (see http://www.slideshare.net/otisg/finite-
> state-queries-in-lucene).
> 
>  * FuzzyQuery is 100-200 times faster than in past releases (see
> http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-
> faster.html).
> 
>  * A new spell checker, DirectSpellChecker, finds possible corrections directly
> against the main search index without requiring a separate index.
> 
>  * Various in-memory data structures such as the term dictionary and
> FieldCache are represented more efficiently with less object overhead (see
> http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-
> searching.html).
> 
>  * All search logic is now required to work per segment, IndexReader was
> therefore refactored to differentiate between atomic and composite readers
> (see http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).
> 
>  * Lucene 4.0 provides a modular API, consolidating components such as
> Analyzers and Queries that were previously scattered across Lucene core,
> contrib, and Solr. These modules also include additional functionality such as
> UIMA analyzer integration and a completely reworked spatial search
> implementation.
> 
> Noteworthy changes since 4.0-BETA:
> 
>  * A new "Block" PostingsFormat offering improved search performance and
> index compression. This will likely become the default format in a future
> release. (see http://blog.mikemccandless.com/2012/08/lucenes-new-
> blockpostingsformat-thanks.html).
> 
>  * All non-default codec implementations were moved to a separated codecs
> module. Just add lucene-codecs-4.0.0.jar to your classpath to test these out.
> 
>  * Payloads can be optionally stored on the term vectors.
> 
>  * Many bugfixes and optimizations.
> 
> Please read CHANGES.txt and MIGRATE.txt for a full list of new features and
> notes on upgrading. Particularly, the new apis are not compatible with previous
> versions of Lucene, however, file format backwards compatibility is provided
> for indexes from the 3.0 series and the 4.0-alpha and -beta releases.
> 
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/core/discussion.html)
> 
> Note: The Apache Software Foundation uses an extensive mirroring network for
> distributing releases.  It is possible that the mirror you are using may not have
> replicated the release yet.  If that is the case, please try another mirror.  This
> also goes for Maven access.
> 
> Happy searching,
> 
> Apache Lucene/Solr Developers


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message