lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Bell <billnb...@gmail.com>
Subject Re: [ANNOUNCE] Apache Lucene 4.0-alpha released.
Date Thu, 05 Jul 2012 01:09:22 GMT
Hey how do we use the MemoryCodec in Solr?

Sent from my Mobile device
720-256-8076

On Jul 3, 2012, at 7:09 AM, Robert Muir <rmuir@apache.org> wrote:

> 3 July 2012, Apache Luceneā€š 4.0-alpha available
> The Lucene PMC is pleased to announce the release of Apache Lucene 4.0-alpha
> 
> Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java. It is a technology suitable for nearly
> any application that requires full-text search, especially cross-platform.
> 
> This release contains numerous bug fixes, optimizations, and
> improvements, some of which are highlighted below.  The release
> is available for immediate download at:
>   http://lucene.apache.org/core/mirrors-core-latest-redir.html?ver=4.0a
> 
> See the CHANGES.txt file included with the release for a full list of
> details.
> 
> Lucene 4.0-alpha Release Highlights:
> 
> * The index formats for terms, postings lists, stored fields, term
> vectors, etc
>   are pluggable via the Codec api. You can select from the provided
>   implementations or customize the index format with your own Codec
> to meet your needs.
> 
> * Similarity has been decoupled from the vector space model (TF/IDF).
> Additional models
>   such as BM25, Divergence from Randomness, Language Models, and
> Information-based models
>   are provided (see
> http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4).
> 
> * Added support for per-document values (DocValues). DocValues can be
> used for custom
>   scoring factors (accessible via Similarity), for pre-sorted Sort
> values, and more.
> 
> * When indexing via multiple threads, each IndexWriter thread now
> flushes its own segment
>   to disk concurrently, resulting in substantial performance improvements
>   (see http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html).
> 
> * Per-document normalization factors ("norms") are no longer limited
> to a single byte.
>   Similarity implementations can use any DocValues type to store norms.
> 
> * Added index statistics such as the number of tokens for a term or
> field, number of postings
>   for a field, and number of documents with a posting for a field:
> these support additional
>   scoring models (see
>   http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-40.html).
> 
> * Implemented a new default term dictionary/index (BlockTree) that
> indexes shared prefixes
>   instead of every n'th term. This is not only more time- and space-
> efficient, but can
>   also sometimes avoid going to disk at all for terms that do not
> exist. Alternative term
>   dictionary implementions are provided and pluggable via the Codec api.
> 
> * Indexed terms are no longer UTF-16 char sequences, instead terms
> can be any binary
>   value encoded as byte arrays. By default, text terms are now encoded as UTF-8
>   bytes. Sort order of terms is now defined by their binary value,
> which is identical
>   to UTF-8 sort order.
> 
> * Substantially faster performance when using a Filter during searching.
> 
> * File-system based directories can rate-limit the IO (MB/sec) of merge
>   threads, to reduce IO contention between merging and searching threads.
> 
> * Added a number of alternative Codecs and components for different
> use-cases: "Appending"
>   works with append-only filesystems (such as Hadoop DFS), "Memory"
> writes the entire
>   terms+postings as an FST read into RAM (see
>   http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-with.html),
>   "Pulsing" inlines the postings for low-frequency terms into the
> term dictionary (see
>   http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html),
>   "SimpleText" writes all files in plain-text for easy
> debugging/transparency (see
>   http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html),
> among others.
> 
> * Term offsets can be optionally encoded into the postings lists and
> can be retrieved
>   per-position.
> 
> * A new AutomatonQuery returns all documents containing any term
> matching a provided
>   finite-state automaton (see
> http://www.slideshare.net/otisg/finite-state-queries-in-lucene).
> 
> * FuzzyQuery is 100-200 times faster than in past releases (see
>   http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html).
> 
> * A new spell checker, DirectSpellChecker, finds possible corrections
> directly against the
>   main search index without requiring a separate index.
> 
> * Various in-memory data structures such as the term dictionary and
> FieldCache are represented
>   more efficiently with less object overhead (see
> http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html).
> 
> * All search logic is now required to work per segment, IndexReader
> was therefore refactored to
>   differentiate between atomic and composite readers
>   (see http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).
> 
> * Lucene 4.0 provides a modular API, consolidating components such as
> Analyzers and Queries
>   that were previously scattered across Lucene core, contrib, and
> Solr. These modules also
>   include additional functionality such as UIMA analyzer integration
> and a completely reworked
>   spatial search implementation.
> 
> Please read CHANGES.txt and MIGRATE.txt for a full list of new
> features and notes on upgrading.
> Particularly, the new apis are not compatible with previous version of
> Lucene, however, file
> format backwards compatibility is provided for indexes from the 3.0 series.
> 
> This is an alpha release for early adopters. The guarantee for this
> alpha release is that the index
> format will be the 4.0 index format, supported through the 5.x series
> of Apache Lucene, unless there
> is a critical bug (e.g. that would cause index corruption) that would
> prevent this.
> 
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/core/discussion.html)
> 
> Happy searching,
> 
> Apache Lucene/Solr Developers
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message