Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF4A2D15C for ; Fri, 12 Oct 2012 08:58:59 +0000 (UTC) Received: (qmail 24827 invoked by uid 500); 12 Oct 2012 08:58:58 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 24776 invoked by uid 500); 12 Oct 2012 08:58:58 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 24766 invoked by uid 99); 12 Oct 2012 08:58:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2012 08:58:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gento0nz@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2012 08:58:51 +0000 Received: by mail-vb0-f48.google.com with SMTP id e21so3296772vbm.35 for ; Fri, 12 Oct 2012 01:58:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=dd+qZ1lkI8btIYjQqkH1GEWOLicy13VUHJMA6LUBoAI=; b=aPD0z2gGzgmgYJpslqtNXkErXJ3/i1G0NILQlV0yZDfhybxhQRugL04X9ZBNninnFp epioG4BLTGuuujRFeH8AGR9nTmGtKiiPG1hweBXPSUtIvvMp+lutH/hQi2a1HuI3xEgW ez3gT7CGpK7WpwgkiJD8BRX/zbwoy5fXJdiACJIxye2AeZ3YjOilpEyk9EIzBG6wkUby RoFNDGg7rp53Iw99zI7t5Ujj1ScoprEFMmtG56qGPaOV/FhGwNd7AZnKINlWFzKXuVch ofhz4+4PV532ugoaXfzdJDUNSQd617H9o2YQll4sijLAMmhJvmS9jUzl8W9MZ3Z226Qx kAwg== MIME-Version: 1.0 Received: by 10.221.12.9 with SMTP id pg9mr2151093vcb.68.1350032310681; Fri, 12 Oct 2012 01:58:30 -0700 (PDT) Received: by 10.58.12.37 with HTTP; Fri, 12 Oct 2012 01:58:30 -0700 (PDT) In-Reply-To: <004e01cda854$61ff1a40$25fd4ec0$@thetaphi.de> References: <004e01cda854$61ff1a40$25fd4ec0$@thetaphi.de> Date: Fri, 12 Oct 2012 21:58:30 +1300 Message-ID: Subject: Re: [ANNOUNCE] Apache Lucene 4.0 released. From: Chris Male To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=bcaec54b4ada5041e804cbd8e4b1 --bcaec54b4ada5041e804cbd8e4b1 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable A great day. On Fri, Oct 12, 2012 at 9:34 PM, Uwe Schindler wrote: > Thanks Robert for doing the hard work of managing this release! > Absolutely. Thanks Robert. > > I am happy that the release finally came out, after a long time of > development, code refactoring, and lots of non-finite beer-automatons! > > Uwe > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > > -----Original Message----- > > From: Robert Muir [mailto:rmuir@apache.org] > > Sent: Friday, October 12, 2012 10:10 AM > > To: dev@lucene.apache.org; Lucene mailing list; java-user; announce > > Subject: [ANNOUNCE] Apache Lucene 4.0 released. > > > > October 12 2012, Apache Lucene=82 4.0 available. > > The Lucene PMC is pleased to announce the release of Apache Lucene 4.0 > > > > Apache Lucene is a high-performance, full-featured text search engine > library > > written entirely in Java. It is a technology suitable for nearly any > application > > that requires full-text search, especially cross-platform. > > > > This release contains numerous bug fixes, optimizations, and > improvements, > > some of which are highlighted below. The release is available for > immediate > > download at: > > http://lucene.apache.org/core/mirrors-core-latest-redir.html > > > > See the CHANGES.txt file included with the release for a full list of > details. > > > > Lucene 4.0 Release Highlights: > > > > * The index formats for terms, postings lists, stored fields, term > vectors, etc > > are pluggable via the Codec api. You can select from the provided > > implementations or customize the index format with your own Codec to me= et > > your needs. > > > > * Similarity has been decoupled from the vector space model (TF/IDF). > > Additional models such as BM25, Divergence from Randomness, Language > > Models, and Information-based models are provided (see > > > http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucen= e- > > 4). > > > > * The new doc values feature stores typed values per-document. It can > be > > used for custom scoring factors (accessible via Similarity), for > pre-sorted Sort > > values, and more. > > > > * IndexWriter now flushes segments to disk concurrently, when the > application > > uses multiple threads for indexing, resulting in substantial performanc= e > > improvements (see http://blog.mikemccandless.com/2011/05/265-indexing- > > speedup-with-lucenes.html). > > > > * Per-document normalization factors ("norms") are no longer limited t= o > a > > single byte. Similarity implementations can use any DocValues type to > store > > norms. > > > > * New index statistics have been added, including the number of tokens > for a > > term or field, number of postings for a field, and number of documents > with a > > posting for a field. These support additional scoring models (see > > http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene- > > 40.html). > > > > * A new default term dictionary/index (BlockTree) indexes shared > prefixes > > instead of every n'th term. This is not only more time- and > > space- efficient, but can avoid going to disk at all for terms that do > not exist in > > certain cases. Alternative term dictionary implementions are provided a= nd > > pluggable via the Codec api. > > > > * Indexed terms are no longer limited to UTF-16 char sequences; they > can now > > be any binary value encoded as byte arrays. By default, text terms are > encoded > > as UTF-8 bytes. Sort order of terms is defined by their binary value, > which is > > identical to UTF-8 (Unicode code point) sort order. > > > > * Substantially faster performance when using a Filter during searchin= g. > > > > * File-system based directories can rate-limit the IO (MB/sec) of merg= e > > threads, to reduce IO contention between merging and searching threads. > > > > * A number of alternative Codecs and components have been added: > > "Appending" works with append-only filesystems (such as Hadoop DFS), > > "Memory" writes the entire terms+postings as an FST read into RAM (see > > > http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster= - > > with.html), > > "Pulsing" inlines the postings for low-frequency terms into the term > dictionary > > (see http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on- > > primary-key.html), > > "SimpleText" writes all files in plain-text for easy > debugging/transparency (see > > http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html), > > "Bloom" uses a bloom filter to sometimes avoid disk seeks when looking = up > > terms, "Direct" holds all postings as simple byte[] and int[] for very > fast > > performance at the cost of very high RAM consumption, "Block" use a new > > index layout and compression scheme for improved performance, among > > others. > > > > * Term offsets can be optionally encoded into the postings lists and > retrieved > > per-position. > > > > * A new AutomatonQuery returns all documents containing any term > matching > > a provided finite-state automaton (see > http://www.slideshare.net/otisg/finite- > > state-queries-in-lucene). > > > > * FuzzyQuery is 100-200 times faster than in past releases (see > > http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times- > > faster.html). > > > > * A new spell checker, DirectSpellChecker, finds possible corrections > directly > > against the main search index without requiring a separate index. > > > > * Various in-memory data structures such as the term dictionary and > > FieldCache are represented more efficiently with less object overhead > (see > > http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for- > > searching.html). > > > > * All search logic is now required to work per segment, IndexReader wa= s > > therefore refactored to differentiate between atomic and composite > readers > > (see > http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html). > > > > * Lucene 4.0 provides a modular API, consolidating components such as > > Analyzers and Queries that were previously scattered across Lucene core= , > > contrib, and Solr. These modules also include additional functionality > such as > > UIMA analyzer integration and a completely reworked spatial search > > implementation. > > > > Noteworthy changes since 4.0-BETA: > > > > * A new "Block" PostingsFormat offering improved search performance an= d > > index compression. This will likely become the default format in a futu= re > > release. (see http://blog.mikemccandless.com/2012/08/lucenes-new- > > blockpostingsformat-thanks.html). > > > > * All non-default codec implementations were moved to a separated code= cs > > module. Just add lucene-codecs-4.0.0.jar to your classpath to test thes= e > out. > > > > * Payloads can be optionally stored on the term vectors. > > > > * Many bugfixes and optimizations. > > > > Please read CHANGES.txt and MIGRATE.txt for a full list of new features > and > > notes on upgrading. Particularly, the new apis are not compatible with > previous > > versions of Lucene, however, file format backwards compatibility is > provided > > for indexes from the 3.0 series and the 4.0-alpha and -beta releases. > > > > Please report any feedback to the mailing lists > > (http://lucene.apache.org/core/discussion.html) > > > > Note: The Apache Software Foundation uses an extensive mirroring networ= k > for > > distributing releases. It is possible that the mirror you are using ma= y > not have > > replicated the release yet. If that is the case, please try another > mirror. This > > also goes for Maven access. > > > > Happy searching, > > > > Apache Lucene/Solr Developers > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --=20 Chris Male | Open Source Search Developer | elasticsearch | www.e lasticsearch.com --bcaec54b4ada5041e804cbd8e4b1 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable A great day.

On Fri, Oct 12, 2012 at 9:34= PM, Uwe Schindler <uwe@thetaphi.de> wrote:
Thanks Robert for doing the hard work of managing this release!

Absolutely. =A0Thanks Robert.
=A0
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
I am happy that the release finally came out, after a long time of developm= ent, code refactoring, and lots of non-finite beer-automatons!

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de=
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Robert Muir [mailto:rmuir@ap= ache.org]
> Sent: Friday, October 12, 2012 10:10 AM
> To: dev@lucene.apache.org= ; Lucene mailing list; java-user; announce
> Subject: [ANNOUNCE] Apache Lucene 4.0 released.
>
> October 12 2012, Apache Lucene=82 4.0 available.
> The Lucene PMC is pleased to announce the release of Apache Lucene 4.0=
>
> Apache Lucene is a high-performance, full-featured text search engine = library
> written entirely in Java. It is a technology suitable for nearly any a= pplication
> that requires full-text search, especially cross-platform.
>
> This release contains numerous bug fixes, optimizations, and improveme= nts,
> some of which are highlighted below. =A0The release is available for i= mmediate
> download at:
> =A0 =A0http://lucene.apache.org/core/mirrors-core-late= st-redir.html
>
> See the CHANGES.txt file included with the release for a full list of = details.
>
> Lucene 4.0 Release Highlights:
>
> =A0* The index formats for terms, postings lists, stored fields, term = vectors, etc
> are pluggable via the Codec api. You can select from the provided
> implementations or customize the index format with your own Codec to m= eet
> your needs.
>
> =A0* Similarity has been decoupled from the vector space model (TF/IDF= ).
> Additional models such as BM25, Divergence from Randomness, Language > Models, and Information-based models are provided (see
> http://www.lucidimagination.com/blog/20= 11/09/12/flexible-ranking-in-lucene-
> 4).
>
> =A0* The new doc values feature stores typed values per-document. =A0I= t can be
> used for custom scoring factors (accessible via Similarity), for pre-s= orted Sort
> values, and more.
>
> =A0* IndexWriter now flushes segments to disk concurrently, when the a= pplication
> uses multiple threads for indexing, resulting in substantial performan= ce
> improvements (see http://blog.mikemccandless.com/2011/05/265-i= ndexing-
> speedup-with-lucenes.html).
>
> =A0* Per-document normalization factors ("norms") are no lon= ger limited to a
> single byte. Similarity implementations can use any DocValues type to = store
> norms.
>
> =A0* New index statistics have been added, including the number of tok= ens for a
> term or field, number of postings for a field, and number of documents= with a
> posting for a field. =A0These support additional scoring models (see > http://blog.mikemccandless.com/2012/03/new-i= ndex-statistics-in-lucene-
> 40.html).
>
> =A0* A new default term dictionary/index (BlockTree) indexes shared pr= efixes
> instead of every n'th term. This is not only more time- and
> space- efficient, but can avoid going to disk at all for terms that do= not exist in
> certain cases. Alternative term dictionary implementions are provided = and
> pluggable via the Codec api.
>
> =A0* Indexed terms are no longer limited to UTF-16 char sequences; the= y can now
> be any binary value encoded as byte arrays. By default, text terms are= encoded
> as UTF-8 bytes. Sort order of terms is defined by their binary value, = which is
> identical to UTF-8 (Unicode code point) sort order.
>
> =A0* Substantially faster performance when using a Filter during searc= hing.
>
> =A0* File-system based directories can rate-limit the IO (MB/sec) of m= erge
> threads, to reduce IO contention between merging and searching threads= .
>
> =A0* A number of alternative Codecs and components have been added: > "Appending" works with append-only filesystems (such as Hado= op DFS),
> "Memory" writes the entire terms+postings as an FST read int= o RAM (see
> http://blog.mikemccandless.com/2011/06/p= rimary-key-lookups-are-28x-faster-
> with.html),
> "Pulsing" inlines the postings for low-frequency terms into = the term dictionary
> (see http://blog.mikemccandless.com/2010/06/lucenes= -pulsingcodec-on-
> primary-key.html),
> "SimpleText" writes all files in plain-text for easy debuggi= ng/transparency (see
> http://blog.mikemccandless.com/2010/10/lucenes= -simpletext-codec.html),
> "Bloom" uses a bloom filter to sometimes avoid disk seeks wh= en looking up
> terms, "Direct" holds all postings as simple byte[] and int[= ] for very fast
> performance at the cost of very high RAM consumption, "Block"= ; use a new
> index layout and compression scheme for improved performance, among > others.
>
> =A0* Term offsets can be optionally encoded into the postings lists an= d retrieved
> per-position.
>
> =A0* A new AutomatonQuery returns all documents containing any term ma= tching
> a provided finite-state automaton (see http://www.slideshare.net/otisg/fini= te-
> state-queries-in-lucene).
>
> =A0* FuzzyQuery is 100-200 times faster than in past releases (see
> http://blog.mikemccandless.com/2011/03/luce= nes-fuzzyquery-is-100-times-
> faster.html).
>
> =A0* A new spell checker, DirectSpellChecker, finds possible correctio= ns directly
> against the main search index without requiring a separate index.
>
> =A0* Various in-memory data structures such as the term dictionary and=
> FieldCache are represented more efficiently with less object overhead = (see
> http://blog.mikemccandless.com/2010/07/lucenes-ram-us= age-for-
> searching.html).
>
> =A0* All search logic is now required to work per segment, IndexReader= was
> therefore refactored to differentiate between atomic and composite rea= ders
> (see http://blog.thetaphi.de/2012/02/is-your-= indexreader-atomic-major.html).
>
> =A0* Lucene 4.0 provides a modular API, consolidating components such = as
> Analyzers and Queries that were previously scattered across Lucene cor= e,
> contrib, and Solr. These modules also include additional functionality= such as
> UIMA analyzer integration and a completely reworked spatial search
> implementation.
>
> Noteworthy changes since 4.0-BETA:
>
> =A0* A new "Block" PostingsFormat offering improved search p= erformance and
> index compression. This will likely become the default format in a fut= ure
> release. (see http://blog.mikemccandless.com/2012/08/lucenes-ne= w-
> blockpostingsformat-thanks.html).
>
> =A0* All non-default codec implementations were moved to a separated c= odecs
> module. Just add lucene-codecs-4.0.0.jar to your classpath to test the= se out.
>
> =A0* Payloads can be optionally stored on the term vectors.
>
> =A0* Many bugfixes and optimizations.
>
> Please read CHANGES.txt and MIGRATE.txt for a full list of new feature= s and
> notes on upgrading. Particularly, the new apis are not compatible with= previous
> versions of Lucene, however, file format backwards compatibility is pr= ovided
> for indexes from the 3.0 series and the 4.0-alpha and -beta releases.<= br> >
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/core/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring netwo= rk for
> distributing releases. =A0It is possible that the mirror you are using= may not have
> replicated the release yet. =A0If that is the case, please try another= mirror. =A0This
> also goes for Maven access.
>
> Happy searching,
>
> Apache Lucene/Solr Developers


-----------------------= ----------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org




--
= Chris Male | Open Source Search Developer | elasticsearch | www.elasticsearch.com
--bcaec54b4ada5041e804cbd8e4b1--