lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene Block term Dictionary
Date Wed, 06 Jul 2016 22:20:49 GMT
The latest terms dictionary is "block tree", and unfortunately there are no
guides here, besides of course the source code
(BlockTreeTermsWriter/Reader).  See especially the comments in those
sources: they point to a paper describing the inspiration for this
implementation.

The high level view is that this terms dictionary breaks up the sorted
terms into variable sized blocks (25 to 48 terms in each block) at "good"
boundaries, where the term prefixes change, to maximize overall compression.

The in-memory (JVM heap) FST terms index is used to find which on-disk
block may have a given term, and so on lookup of a given term, we walk the
FST, and then seek to that block and scan.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 12:04 PM, Mohit Sidana <msidana89@gmail.com> wrote:

> Hello,
>
> I am interested to learn more about how Lucene uses block tree term
> dictionary.
>
> while doing research on this topic i found some useful information listed
> on below links.
>
>
> 1.
> http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
> 2.
> http://blog.mikemccandless.com/2013/09/lucene-now-has-in-memory-terms.html
> 3. http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal
>
>
> I do understand that Lucene uses <FST> to store Prefixes of terms in to
> memory and lookup terms/posting on disk but i am unable to visualize how
> actual search working in Lucene 6.0.
>
> Please can someone suggest a guide which i can follow to understand all
> step by step operation how actually a term search works with blockterms
> dictionary?
>
> Thanks.
>

Mime
View raw message