lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Lucene-java Wiki] Update of "LucenePapers" by jpountz
Date Sun, 24 Jun 2012 12:34:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "LucenePapers" page has been changed by jpountz:

New page:
= Lucene Papers =

To understand the fundamental ideas behind Lucene, you should first get familiar with InformationRetrieval.
This page tries to collect links to resources that present more advanced ideas.

== Storage ==

=== Postings list encoding ===

In addition to VInt encoding, Lucene supports (or plans to support) other postings list encoding
formats (FOR, PFOR, Simple9 ...):

 * [[|Performance of Compressed Inverted List
Caching in Search Engines]]. Jiangong Zhang, Xiaohui Long, Torsten Suel. (2008)
 * [[|Lucene
performance with the PForDelta codec]]. Mike McCandless, Changing bits, August 2nd, 2010.

=== The Pulsing codec ===

An optimized codec for fields that have lots of rare terms.

 * [[|Optimizations for Dynamic
Inverted Index maintenance]]. Doug Cutting, Jan Pedersen.
 * [[|Lucene's
PulsingCodec on "Primary Key" Fields]]. Mike McCandless, Changing bits, June 5th, 2010.

== Query execution ==

=== Terms dictionary ===

Lucene has a new block tree terms dictionary, inspired of burst tries.

 * [[|LUCENE-3030 Block tree terms dict &
 * [[|Automata
invasion]] Robert Muir, Michael McCandless,
 * [[|Burst Tries: A Fast, Efficient
Data Structure for String Keys]]. Steffen Heinz , Justin Zobel , Hugh E. Williams. (2002)

=== NumericRangeQuery ===

Lucene has an optimized range query implementation for numeric types:

 * [[|NumericRangeQuery]],
 * [[|Generic XML-based Framework for Metadata
Portals. Computers & Geosciences 34 (12), 1947-1955]]. Schindler, U, Diepenbroek, M (2008).

=== Automaton-based fuzzy query ===

Lucene 4.0 supports an improved fuzzy query implementation that is based on Levenshtein automata.

 * [[|Fast String Correction
with Levenshtein-Automata.]] Klaus Schulz , Stoyan Mihov. (2002)
 * [[|Lucene's
FuzzyQuery is 100 times faster in 4.0]]. Mike McCandless, Changing bits, March 24th, 2011.

== Misc ==

=== FST compression ===

Lucene uses FSTs a lot, so their in-memory size is important.

 * [[|Smaller Representation
of Finite State Automata]]. Jan Daciuk, Dawid Weiss.

View raw message