Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: <CAAqRqaNUCetitjKOsd19BsOqnz9o3jWXLdoyLz9w2B=fvO4LJg@mail.gmail.com>
References: <CAAqRqaNUCetitjKOsd19BsOqnz9o3jWXLdoyLz9w2B=fvO4LJg@mail.gmail.com>
From: Michael McCandless <lucene@mikemccandless.com>
Date: Wed, 6 Jul 2016 18:20:49 -0400
Message-ID: <CAL8PwkZ1f39NiUvzA5JqQ78KQ+CV0x+Vz4ktLmk9xP+CRHOq+A@mail.gmail.com>
Subject: Re: Lucene Block term Dictionary
To: "Lucene/Solr dev" <dev@lucene.apache.org>, msidana89@gmail.com
Content-Type: multipart/alternative; boundary=001a113f377a750c990536fefcc6
archived-at: Wed, 06 Jul 2016 22:21:15 -0000

--001a113f377a750c990536fefcc6
Content-Type: text/plain; charset=UTF-8

The latest terms dictionary is "block tree", and unfortunately there are no
guides here, besides of course the source code
(BlockTreeTermsWriter/Reader).  See especially the comments in those
sources: they point to a paper describing the inspiration for this
implementation.

The high level view is that this terms dictionary breaks up the sorted
terms into variable sized blocks (25 to 48 terms in each block) at "good"
boundaries, where the term prefixes change, to maximize overall compression.

The in-memory (JVM heap) FST terms index is used to find which on-disk
block may have a given term, and so on lookup of a given term, we walk the
FST, and then seek to that block and scan.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 12:04 PM, Mohit Sidana <msidana89@gmail.com> wrote:

> Hello,
>
> I am interested to learn more about how Lucene uses block tree term
> dictionary.
>
> while doing research on this topic i found some useful information listed
> on below links.
>
>
> 1.
> http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
> 2.
> http://blog.mikemccandless.com/2013/09/lucene-now-has-in-memory-terms.html
> 3. http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal
>
>
> I do understand that Lucene uses <FST> to store Prefixes of terms in to
> memory and lookup terms/posting on disk but i am unable to visualize how
> actual search working in Lucene 6.0.
>
> Please can someone suggest a guide which i can follow to understand all
> step by step operation how actually a term search works with blockterms
> dictionary?
>
> Thanks.
>

--001a113f377a750c990536fefcc6
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">The latest terms dictionary is &quot;block tree&quot;, and=
 unfortunately there are no guides here, besides of course the source code =
(BlockTreeTermsWriter/Reader).=C2=A0 See especially the comments in those s=
ources: they point to a paper describing the inspiration for this implement=
ation.<div><br></div><div>The high level view is that this terms dictionary=
 breaks up the sorted terms into variable sized blocks (25 to 48 terms in e=
ach block) at &quot;good&quot; boundaries, where the term prefixes change, =
to maximize overall compression.</div><div><br></div><div>The in-memory (JV=
M heap) FST terms index is used to find which on-disk block may have a give=
n term, and so on lookup of a given term, we walk the FST, and then seek to=
 that block and scan.</div></div><div class=3D"gmail_extra"><br clear=3D"al=
l"><div><div class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><=
div dir=3D"ltr"><div>Mike McCandless<br><br><a href=3D"http://blog.mikemcca=
ndless.com" target=3D"_blank">http://blog.mikemccandless.com</a></div></div=
></div></div>
<br><div class=3D"gmail_quote">On Wed, Jul 6, 2016 at 12:04 PM, Mohit Sidan=
a <span dir=3D"ltr">&lt;<a href=3D"mailto:msidana89@gmail.com" target=3D"_b=
lank">msidana89@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex"><div dir=3D"ltr">Hello,<div><br></div><div>I am interested to learn =
more about how Lucene uses block tree term dictionary.</div><div><br></div>=
<div>while doing research on this topic i found some useful information lis=
ted on below links.</div><div><br></div><div><br></div><div>1.=C2=A0<a href=
=3D"http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-=
uuid.html" target=3D"_blank">http://blog.mikemccandless.com/2014/05/choosin=
g-fast-unique-identifier-uuid.html</a><br></div><div>2.=C2=A0<a href=3D"htt=
p://blog.mikemccandless.com/2013/09/lucene-now-has-in-memory-terms.html" ta=
rget=3D"_blank">http://blog.mikemccandless.com/2013/09/lucene-now-has-in-me=
mory-terms.html</a><br></div><div>3.=C2=A0<a href=3D"http://www.slideshare.=
net/lucenerevolution/what-is-inaluceneagrandfinal" target=3D"_blank">http:/=
/www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal</a><br></=
div><div><br></div><div><br></div><div>I do understand that Lucene uses &lt=
;FST&gt; to store Prefixes of terms in to memory and lookup terms/posting o=
n disk but i am unable to visualize how actual search working in Lucene 6.0=
.</div><div><br></div><div>Please can someone suggest a guide which i can f=
ollow to understand all step by step operation how actually a term search w=
orks with blockterms dictionary?</div><div><br></div><div>Thanks.</div></di=
v>
</blockquote></div><br></div>

--001a113f377a750c990536fefcc6--