Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of gento0nz@gmail.com designates
 209.85.212.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <004e01cda854$61ff1a40$25fd4ec0$@thetaphi.de>
References: 
 <CAOdYfZXgabz+GE0hLio19hbEHy6dGKgiSHZxF3s15L3Fa06F4A@mail.gmail.com>
	<004e01cda854$61ff1a40$25fd4ec0$@thetaphi.de>
Date: Fri, 12 Oct 2012 21:58:30 +1300
Message-ID: 
 <CACQS3vSZ_ev5yeL1ehsOs=oWumZTDvjB-DbSY5nYaD1ua0EYfw@mail.gmail.com>
Subject: Re: [ANNOUNCE] Apache Lucene 4.0 released.
From: Chris Male <gento0nz@gmail.com>
To: dev@lucene.apache.org
Content-Type: multipart/alternative; boundary=bcaec54b4ada5041e804cbd8e4b1

--bcaec54b4ada5041e804cbd8e4b1
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

A great day.

On Fri, Oct 12, 2012 at 9:34 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Thanks Robert for doing the hard work of managing this release!
>

Absolutely.  Thanks Robert.


>
> I am happy that the release finally came out, after a long time of
> development, code refactoring, and lots of non-finite beer-automatons!
>
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Robert Muir [mailto:rmuir@apache.org]
> > Sent: Friday, October 12, 2012 10:10 AM
> > To: dev@lucene.apache.org; Lucene mailing list; java-user; announce
> > Subject: [ANNOUNCE] Apache Lucene 4.0 released.
> >
> > October 12 2012, Apache Lucene=82 4.0 available.
> > The Lucene PMC is pleased to announce the release of Apache Lucene 4.0
> >
> > Apache Lucene is a high-performance, full-featured text search engine
> library
> > written entirely in Java. It is a technology suitable for nearly any
> application
> > that requires full-text search, especially cross-platform.
> >
> > This release contains numerous bug fixes, optimizations, and
> improvements,
> > some of which are highlighted below.  The release is available for
> immediate
> > download at:
> >    http://lucene.apache.org/core/mirrors-core-latest-redir.html
> >
> > See the CHANGES.txt file included with the release for a full list of
> details.
> >
> > Lucene 4.0 Release Highlights:
> >
> >  * The index formats for terms, postings lists, stored fields, term
> vectors, etc
> > are pluggable via the Codec api. You can select from the provided
> > implementations or customize the index format with your own Codec to me=
et
> > your needs.
> >
> >  * Similarity has been decoupled from the vector space model (TF/IDF).
> > Additional models such as BM25, Divergence from Randomness, Language
> > Models, and Information-based models are provided (see
> >
> http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucen=
e-
> > 4).
> >
> >  * The new doc values feature stores typed values per-document.  It can
> be
> > used for custom scoring factors (accessible via Similarity), for
> pre-sorted Sort
> > values, and more.
> >
> >  * IndexWriter now flushes segments to disk concurrently, when the
> application
> > uses multiple threads for indexing, resulting in substantial performanc=
e
> > improvements (see http://blog.mikemccandless.com/2011/05/265-indexing-
> > speedup-with-lucenes.html).
> >
> >  * Per-document normalization factors ("norms") are no longer limited t=
o
> a
> > single byte. Similarity implementations can use any DocValues type to
> store
> > norms.
> >
> >  * New index statistics have been added, including the number of tokens
> for a
> > term or field, number of postings for a field, and number of documents
> with a
> > posting for a field.  These support additional scoring models (see
> > http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-
> > 40.html).
> >
> >  * A new default term dictionary/index (BlockTree) indexes shared
> prefixes
> > instead of every n'th term. This is not only more time- and
> > space- efficient, but can avoid going to disk at all for terms that do
> not exist in
> > certain cases. Alternative term dictionary implementions are provided a=
nd
> > pluggable via the Codec api.
> >
> >  * Indexed terms are no longer limited to UTF-16 char sequences; they
> can now
> > be any binary value encoded as byte arrays. By default, text terms are
> encoded
> > as UTF-8 bytes. Sort order of terms is defined by their binary value,
> which is
> > identical to UTF-8 (Unicode code point) sort order.
> >
> >  * Substantially faster performance when using a Filter during searchin=
g.
> >
> >  * File-system based directories can rate-limit the IO (MB/sec) of merg=
e
> > threads, to reduce IO contention between merging and searching threads.
> >
> >  * A number of alternative Codecs and components have been added:
> > "Appending" works with append-only filesystems (such as Hadoop DFS),
> > "Memory" writes the entire terms+postings as an FST read into RAM (see
> >
> http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster=
-
> > with.html),
> > "Pulsing" inlines the postings for low-frequency terms into the term
> dictionary
> > (see http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-
> > primary-key.html),
> > "SimpleText" writes all files in plain-text for easy
> debugging/transparency (see
> > http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html),
> > "Bloom" uses a bloom filter to sometimes avoid disk seeks when looking =
up
> > terms, "Direct" holds all postings as simple byte[] and int[] for very
> fast
> > performance at the cost of very high RAM consumption, "Block" use a new
> > index layout and compression scheme for improved performance, among
> > others.
> >
> >  * Term offsets can be optionally encoded into the postings lists and
> retrieved
> > per-position.
> >
> >  * A new AutomatonQuery returns all documents containing any term
> matching
> > a provided finite-state automaton (see
> http://www.slideshare.net/otisg/finite-
> > state-queries-in-lucene).
> >
> >  * FuzzyQuery is 100-200 times faster than in past releases (see
> > http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-
> > faster.html).
> >
> >  * A new spell checker, DirectSpellChecker, finds possible corrections
> directly
> > against the main search index without requiring a separate index.
> >
> >  * Various in-memory data structures such as the term dictionary and
> > FieldCache are represented more efficiently with less object overhead
> (see
> > http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-
> > searching.html).
> >
> >  * All search logic is now required to work per segment, IndexReader wa=
s
> > therefore refactored to differentiate between atomic and composite
> readers
> > (see
> http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).
> >
> >  * Lucene 4.0 provides a modular API, consolidating components such as
> > Analyzers and Queries that were previously scattered across Lucene core=
,
> > contrib, and Solr. These modules also include additional functionality
> such as
> > UIMA analyzer integration and a completely reworked spatial search
> > implementation.
> >
> > Noteworthy changes since 4.0-BETA:
> >
> >  * A new "Block" PostingsFormat offering improved search performance an=
d
> > index compression. This will likely become the default format in a futu=
re
> > release. (see http://blog.mikemccandless.com/2012/08/lucenes-new-
> > blockpostingsformat-thanks.html).
> >
> >  * All non-default codec implementations were moved to a separated code=
cs
> > module. Just add lucene-codecs-4.0.0.jar to your classpath to test thes=
e
> out.
> >
> >  * Payloads can be optionally stored on the term vectors.
> >
> >  * Many bugfixes and optimizations.
> >
> > Please read CHANGES.txt and MIGRATE.txt for a full list of new features
> and
> > notes on upgrading. Particularly, the new apis are not compatible with
> previous
> > versions of Lucene, however, file format backwards compatibility is
> provided
> > for indexes from the 3.0 series and the 4.0-alpha and -beta releases.
> >
> > Please report any feedback to the mailing lists
> > (http://lucene.apache.org/core/discussion.html)
> >
> > Note: The Apache Software Foundation uses an extensive mirroring networ=
k
> for
> > distributing releases.  It is possible that the mirror you are using ma=
y
> not have
> > replicated the release yet.  If that is the case, please try another
> mirror.  This
> > also goes for Maven access.
> >
> > Happy searching,
> >
> > Apache Lucene/Solr Developers
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


--=20
Chris Male | Open Source Search Developer | elasticsearch |
www.e<http://www.dutchworks.nl>
lasticsearch.com

--bcaec54b4ada5041e804cbd8e4b1
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

A great day.<br><br><div class=3D"gmail_quote">On Fri, Oct 12, 2012 at 9:34=
 PM, Uwe Schindler <span dir=3D"ltr">&lt;<a href=3D"mailto:uwe@thetaphi.de"=
 target=3D"_blank">uwe@thetaphi.de</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">
Thanks Robert for doing the hard work of managing this release!<br></blockq=
uote><div><br></div><div>Absolutely. =A0Thanks Robert.</div><div>=A0</div><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex">

<br>
I am happy that the release finally came out, after a long time of developm=
ent, code refactoring, and lots of non-finite beer-automatons!<br>
<br>
Uwe<br>
-----<br>
Uwe Schindler<br>
H.-H.-Meier-Allee 63, D-28213 Bremen<br>
<a href=3D"http://www.thetaphi.de" target=3D"_blank">http://www.thetaphi.de=
</a><br>
eMail: <a href=3D"mailto:uwe@thetaphi.de">uwe@thetaphi.de</a><br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
&gt; -----Original Message-----<br>
&gt; From: Robert Muir [mailto:<a href=3D"mailto:rmuir@apache.org">rmuir@ap=
ache.org</a>]<br>
&gt; Sent: Friday, October 12, 2012 10:10 AM<br>
&gt; To: <a href=3D"mailto:dev@lucene.apache.org">dev@lucene.apache.org</a>=
; Lucene mailing list; java-user; announce<br>
&gt; Subject: [ANNOUNCE] Apache Lucene 4.0 released.<br>
&gt;<br>
&gt; October 12 2012, Apache Lucene=82 4.0 available.<br>
&gt; The Lucene PMC is pleased to announce the release of Apache Lucene 4.0=
<br>
&gt;<br>
&gt; Apache Lucene is a high-performance, full-featured text search engine =
library<br>
&gt; written entirely in Java. It is a technology suitable for nearly any a=
pplication<br>
&gt; that requires full-text search, especially cross-platform.<br>
&gt;<br>
&gt; This release contains numerous bug fixes, optimizations, and improveme=
nts,<br>
&gt; some of which are highlighted below. =A0The release is available for i=
mmediate<br>
&gt; download at:<br>
&gt; =A0 =A0<a href=3D"http://lucene.apache.org/core/mirrors-core-latest-re=
dir.html" target=3D"_blank">http://lucene.apache.org/core/mirrors-core-late=
st-redir.html</a><br>
&gt;<br>
&gt; See the CHANGES.txt file included with the release for a full list of =
details.<br>
&gt;<br>
&gt; Lucene 4.0 Release Highlights:<br>
&gt;<br>
&gt; =A0* The index formats for terms, postings lists, stored fields, term =
vectors, etc<br>
&gt; are pluggable via the Codec api. You can select from the provided<br>
&gt; implementations or customize the index format with your own Codec to m=
eet<br>
&gt; your needs.<br>
&gt;<br>
&gt; =A0* Similarity has been decoupled from the vector space model (TF/IDF=
).<br>
&gt; Additional models such as BM25, Divergence from Randomness, Language<b=
r>
&gt; Models, and Information-based models are provided (see<br>
&gt; <a href=3D"http://www.lucidimagination.com/blog/2011/09/12/flexible-ra=
nking-in-lucene-" target=3D"_blank">http://www.lucidimagination.com/blog/20=
11/09/12/flexible-ranking-in-lucene-</a><br>
&gt; 4).<br>
&gt;<br>
&gt; =A0* The new doc values feature stores typed values per-document. =A0I=
t can be<br>
&gt; used for custom scoring factors (accessible via Similarity), for pre-s=
orted Sort<br>
&gt; values, and more.<br>
&gt;<br>
&gt; =A0* IndexWriter now flushes segments to disk concurrently, when the a=
pplication<br>
&gt; uses multiple threads for indexing, resulting in substantial performan=
ce<br>
&gt; improvements (see <a href=3D"http://blog.mikemccandless.com/2011/05/26=
5-indexing-" target=3D"_blank">http://blog.mikemccandless.com/2011/05/265-i=
ndexing-</a><br>
&gt; speedup-with-lucenes.html).<br>
&gt;<br>
&gt; =A0* Per-document normalization factors (&quot;norms&quot;) are no lon=
ger limited to a<br>
&gt; single byte. Similarity implementations can use any DocValues type to =
store<br>
&gt; norms.<br>
&gt;<br>
&gt; =A0* New index statistics have been added, including the number of tok=
ens for a<br>
&gt; term or field, number of postings for a field, and number of documents=
 with a<br>
&gt; posting for a field. =A0These support additional scoring models (see<b=
r>
&gt; <a href=3D"http://blog.mikemccandless.com/2012/03/new-index-statistics=
-in-lucene-" target=3D"_blank">http://blog.mikemccandless.com/2012/03/new-i=
ndex-statistics-in-lucene-</a><br>
&gt; 40.html).<br>
&gt;<br>
&gt; =A0* A new default term dictionary/index (BlockTree) indexes shared pr=
efixes<br>
&gt; instead of every n&#39;th term. This is not only more time- and<br>
&gt; space- efficient, but can avoid going to disk at all for terms that do=
 not exist in<br>
&gt; certain cases. Alternative term dictionary implementions are provided =
and<br>
&gt; pluggable via the Codec api.<br>
&gt;<br>
&gt; =A0* Indexed terms are no longer limited to UTF-16 char sequences; the=
y can now<br>
&gt; be any binary value encoded as byte arrays. By default, text terms are=
 encoded<br>
&gt; as UTF-8 bytes. Sort order of terms is defined by their binary value, =
which is<br>
&gt; identical to UTF-8 (Unicode code point) sort order.<br>
&gt;<br>
&gt; =A0* Substantially faster performance when using a Filter during searc=
hing.<br>
&gt;<br>
&gt; =A0* File-system based directories can rate-limit the IO (MB/sec) of m=
erge<br>
&gt; threads, to reduce IO contention between merging and searching threads=
.<br>
&gt;<br>
&gt; =A0* A number of alternative Codecs and components have been added:<br=
>
&gt; &quot;Appending&quot; works with append-only filesystems (such as Hado=
op DFS),<br>
&gt; &quot;Memory&quot; writes the entire terms+postings as an FST read int=
o RAM (see<br>
&gt; <a href=3D"http://blog.mikemccandless.com/2011/06/primary-key-lookups-=
are-28x-faster-" target=3D"_blank">http://blog.mikemccandless.com/2011/06/p=
rimary-key-lookups-are-28x-faster-</a><br>
&gt; with.html),<br>
&gt; &quot;Pulsing&quot; inlines the postings for low-frequency terms into =
the term dictionary<br>
&gt; (see <a href=3D"http://blog.mikemccandless.com/2010/06/lucenes-pulsing=
codec-on-" target=3D"_blank">http://blog.mikemccandless.com/2010/06/lucenes=
-pulsingcodec-on-</a><br>
&gt; primary-key.html),<br>
&gt; &quot;SimpleText&quot; writes all files in plain-text for easy debuggi=
ng/transparency (see<br>
&gt; <a href=3D"http://blog.mikemccandless.com/2010/10/lucenes-simpletext-c=
odec.html" target=3D"_blank">http://blog.mikemccandless.com/2010/10/lucenes=
-simpletext-codec.html</a>),<br>
&gt; &quot;Bloom&quot; uses a bloom filter to sometimes avoid disk seeks wh=
en looking up<br>
&gt; terms, &quot;Direct&quot; holds all postings as simple byte[] and int[=
] for very fast<br>
&gt; performance at the cost of very high RAM consumption, &quot;Block&quot=
; use a new<br>
&gt; index layout and compression scheme for improved performance, among<br=
>
&gt; others.<br>
&gt;<br>
&gt; =A0* Term offsets can be optionally encoded into the postings lists an=
d retrieved<br>
&gt; per-position.<br>
&gt;<br>
&gt; =A0* A new AutomatonQuery returns all documents containing any term ma=
tching<br>
&gt; a provided finite-state automaton (see <a href=3D"http://www.slideshar=
e.net/otisg/finite-" target=3D"_blank">http://www.slideshare.net/otisg/fini=
te-</a><br>
&gt; state-queries-in-lucene).<br>
&gt;<br>
&gt; =A0* FuzzyQuery is 100-200 times faster than in past releases (see<br>
&gt; <a href=3D"http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-i=
s-100-times-" target=3D"_blank">http://blog.mikemccandless.com/2011/03/luce=
nes-fuzzyquery-is-100-times-</a><br>
&gt; faster.html).<br>
&gt;<br>
&gt; =A0* A new spell checker, DirectSpellChecker, finds possible correctio=
ns directly<br>
&gt; against the main search index without requiring a separate index.<br>
&gt;<br>
&gt; =A0* Various in-memory data structures such as the term dictionary and=
<br>
&gt; FieldCache are represented more efficiently with less object overhead =
(see<br>
&gt; <a href=3D"http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-fo=
r-" target=3D"_blank">http://blog.mikemccandless.com/2010/07/lucenes-ram-us=
age-for-</a><br>
&gt; searching.html).<br>
&gt;<br>
&gt; =A0* All search logic is now required to work per segment, IndexReader=
 was<br>
&gt; therefore refactored to differentiate between atomic and composite rea=
ders<br>
&gt; (see <a href=3D"http://blog.thetaphi.de/2012/02/is-your-indexreader-at=
omic-major.html" target=3D"_blank">http://blog.thetaphi.de/2012/02/is-your-=
indexreader-atomic-major.html</a>).<br>
&gt;<br>
&gt; =A0* Lucene 4.0 provides a modular API, consolidating components such =
as<br>
&gt; Analyzers and Queries that were previously scattered across Lucene cor=
e,<br>
&gt; contrib, and Solr. These modules also include additional functionality=
 such as<br>
&gt; UIMA analyzer integration and a completely reworked spatial search<br>
&gt; implementation.<br>
&gt;<br>
&gt; Noteworthy changes since 4.0-BETA:<br>
&gt;<br>
&gt; =A0* A new &quot;Block&quot; PostingsFormat offering improved search p=
erformance and<br>
&gt; index compression. This will likely become the default format in a fut=
ure<br>
&gt; release. (see <a href=3D"http://blog.mikemccandless.com/2012/08/lucene=
s-new-" target=3D"_blank">http://blog.mikemccandless.com/2012/08/lucenes-ne=
w-</a><br>
&gt; blockpostingsformat-thanks.html).<br>
&gt;<br>
&gt; =A0* All non-default codec implementations were moved to a separated c=
odecs<br>
&gt; module. Just add lucene-codecs-4.0.0.jar to your classpath to test the=
se out.<br>
&gt;<br>
&gt; =A0* Payloads can be optionally stored on the term vectors.<br>
&gt;<br>
&gt; =A0* Many bugfixes and optimizations.<br>
&gt;<br>
&gt; Please read CHANGES.txt and MIGRATE.txt for a full list of new feature=
s and<br>
&gt; notes on upgrading. Particularly, the new apis are not compatible with=
 previous<br>
&gt; versions of Lucene, however, file format backwards compatibility is pr=
ovided<br>
&gt; for indexes from the 3.0 series and the 4.0-alpha and -beta releases.<=
br>
&gt;<br>
&gt; Please report any feedback to the mailing lists<br>
&gt; (<a href=3D"http://lucene.apache.org/core/discussion.html" target=3D"_=
blank">http://lucene.apache.org/core/discussion.html</a>)<br>
&gt;<br>
&gt; Note: The Apache Software Foundation uses an extensive mirroring netwo=
rk for<br>
&gt; distributing releases. =A0It is possible that the mirror you are using=
 may not have<br>
&gt; replicated the release yet. =A0If that is the case, please try another=
 mirror. =A0This<br>
&gt; also goes for Maven access.<br>
&gt;<br>
&gt; Happy searching,<br>
&gt;<br>
&gt; Apache Lucene/Solr Developers<br>
<br>
<br>
</div></div><div class=3D"HOEnZb"><div class=3D"h5">-----------------------=
----------------------------------------------<br>
To unsubscribe, e-mail: <a href=3D"mailto:dev-unsubscribe@lucene.apache.org=
">dev-unsubscribe@lucene.apache.org</a><br>
For additional commands, e-mail: <a href=3D"mailto:dev-help@lucene.apache.o=
rg">dev-help@lucene.apache.org</a><br>
<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Chris Male | Open Source Search Developer | elasticsearch | <a href=3D"http=
://www.dutchworks.nl" target=3D"_blank">www.e</a><a href=3D"http://lasticse=
arch.com" target=3D"_blank">lasticsearch.com</a><br>


--bcaec54b4ada5041e804cbd8e4b1--