Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D0CDDB4D for ; Thu, 5 Jul 2012 06:19:26 +0000 (UTC) Received: (qmail 79581 invoked by uid 500); 5 Jul 2012 06:19:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79273 invoked by uid 500); 5 Jul 2012 06:19:21 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 54087 invoked by uid 99); 5 Jul 2012 01:29:50 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of billnbell@gmail.com designates 209.85.214.176 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-type:message-id :content-transfer-encoding:cc:from:subject:date:to:x-mailer; bh=KdkNXUu9iJzydr+HM3qoTjCdsBcN5KfRgqKCAUgAuWQ=; b=DXPCSSRJRcLvOeGNGmtYcq9tq3UUmj7YoSGIqCJI7JqsyqK5L7XRzPL78VoBdPeBfr KV74bi5asBSz5yQracpTiB46Fhrc+OtpeiZR8wesVN45ZNOfI+DoHSgOK/+I3iYE3GZ8 i7toV70L8BREICspKhsEJxX8Ubx+8O8sIsl2mMZLAnMLUkj2XJkm0dsbxZFCk7SazAff mmun54IKEIGUjWaWbJdmvC96UOD4qiXIwZpHd1zy2SZZytcT9+e/kvOf96wgs3wO029e wIUavfm305PrKvmSvkWrlDHMHMbgaIg6+TKQf5SHcqBC0tLHGbJmdqx+73jjEKrt44Vj MG+g== References: In-Reply-To: Mime-Version: 1.0 (1.0) Content-Type: multipart/alternative; boundary=Apple-Mail-FADC7492-EBDD-427C-9EEB-9E2A0759E1C9 Message-Id: <5A375A2C-61F9-454D-BC0A-73A9F7E0C371@gmail.com> Content-Transfer-Encoding: 7bit Cc: "dev@lucene.apache.org" , Lucene mailing list , java-user , announce From: Bill Bell Subject: Re: [ANNOUNCE] Apache Lucene 4.0-alpha released. Date: Wed, 4 Jul 2012 19:09:22 -0600 To: "dev@lucene.apache.org" X-Mailer: iPad Mail (9B206) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-FADC7492-EBDD-427C-9EEB-9E2A0759E1C9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hey how do we use the MemoryCodec in Solr? Sent from my Mobile device 720-256-8076 On Jul 3, 2012, at 7:09 AM, Robert Muir wrote: > 3 July 2012, Apache Lucene=E2=80=9A 4.0-alpha available > The Lucene PMC is pleased to announce the release of Apache Lucene 4.0-alp= ha >=20 > Apache Lucene is a high-performance, full-featured text search engine > library written entirely in Java. It is a technology suitable for nearly > any application that requires full-text search, especially cross-platform.= >=20 > This release contains numerous bug fixes, optimizations, and > improvements, some of which are highlighted below. The release > is available for immediate download at: > http://lucene.apache.org/core/mirrors-core-latest-redir.html?ver=3D4.0a >=20 > See the CHANGES.txt file included with the release for a full list of > details. >=20 > Lucene 4.0-alpha Release Highlights: >=20 > * The index formats for terms, postings lists, stored fields, term > vectors, etc > are pluggable via the Codec api. You can select from the provided > implementations or customize the index format with your own Codec > to meet your needs. >=20 > * Similarity has been decoupled from the vector space model (TF/IDF). > Additional models > such as BM25, Divergence from Randomness, Language Models, and > Information-based models > are provided (see > http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene= -4). >=20 > * Added support for per-document values (DocValues). DocValues can be > used for custom > scoring factors (accessible via Similarity), for pre-sorted Sort > values, and more. >=20 > * When indexing via multiple threads, each IndexWriter thread now > flushes its own segment > to disk concurrently, resulting in substantial performance improvements > (see http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lu= cenes.html). >=20 > * Per-document normalization factors ("norms") are no longer limited > to a single byte. > Similarity implementations can use any DocValues type to store norms. >=20 > * Added index statistics such as the number of tokens for a term or > field, number of postings > for a field, and number of documents with a posting for a field: > these support additional > scoring models (see > http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-40= .html). >=20 > * Implemented a new default term dictionary/index (BlockTree) that > indexes shared prefixes > instead of every n'th term. This is not only more time- and space- > efficient, but can > also sometimes avoid going to disk at all for terms that do not > exist. Alternative term > dictionary implementions are provided and pluggable via the Codec api. >=20 > * Indexed terms are no longer UTF-16 char sequences, instead terms > can be any binary > value encoded as byte arrays. By default, text terms are now encoded as U= TF-8 > bytes. Sort order of terms is now defined by their binary value, > which is identical > to UTF-8 sort order. >=20 > * Substantially faster performance when using a Filter during searching. >=20 > * File-system based directories can rate-limit the IO (MB/sec) of merge > threads, to reduce IO contention between merging and searching threads. >=20 > * Added a number of alternative Codecs and components for different > use-cases: "Appending" > works with append-only filesystems (such as Hadoop DFS), "Memory" > writes the entire > terms+postings as an FST read into RAM (see > http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faste= r-with.html), > "Pulsing" inlines the postings for low-frequency terms into the > term dictionary (see > http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-k= ey.html), > "SimpleText" writes all files in plain-text for easy > debugging/transparency (see > http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html), > among others. >=20 > * Term offsets can be optionally encoded into the postings lists and > can be retrieved > per-position. >=20 > * A new AutomatonQuery returns all documents containing any term > matching a provided > finite-state automaton (see > http://www.slideshare.net/otisg/finite-state-queries-in-lucene). >=20 > * FuzzyQuery is 100-200 times faster than in past releases (see > http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-f= aster.html). >=20 > * A new spell checker, DirectSpellChecker, finds possible corrections > directly against the > main search index without requiring a separate index. >=20 > * Various in-memory data structures such as the term dictionary and > FieldCache are represented > more efficiently with less object overhead (see > http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.htm= l). >=20 > * All search logic is now required to work per segment, IndexReader > was therefore refactored to > differentiate between atomic and composite readers > (see http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.ht= ml). >=20 > * Lucene 4.0 provides a modular API, consolidating components such as > Analyzers and Queries > that were previously scattered across Lucene core, contrib, and > Solr. These modules also > include additional functionality such as UIMA analyzer integration > and a completely reworked > spatial search implementation. >=20 > Please read CHANGES.txt and MIGRATE.txt for a full list of new > features and notes on upgrading. > Particularly, the new apis are not compatible with previous version of > Lucene, however, file > format backwards compatibility is provided for indexes from the 3.0 series= . >=20 > This is an alpha release for early adopters. The guarantee for this > alpha release is that the index > format will be the 4.0 index format, supported through the 5.x series > of Apache Lucene, unless there > is a critical bug (e.g. that would cause index corruption) that would > prevent this. >=20 > Please report any feedback to the mailing lists > (http://lucene.apache.org/core/discussion.html) >=20 > Happy searching, >=20 > Apache Lucene/Solr Developers >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org >=20 --Apple-Mail-FADC7492-EBDD-427C-9EEB-9E2A0759E1C9--