Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 859A7200B4B for ; Thu, 7 Jul 2016 00:21:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 84181160A73; Wed, 6 Jul 2016 22:21:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A7EEC160A64 for ; Thu, 7 Jul 2016 00:21:14 +0200 (CEST) Received: (qmail 30463 invoked by uid 500); 6 Jul 2016 22:21:12 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 30321 invoked by uid 99); 6 Jul 2016 22:21:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jul 2016 22:21:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id D6542C0097 for ; Wed, 6 Jul 2016 22:21:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.299 X-Spam-Level: * X-Spam-Status: No, score=1.299 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=mikemccandless-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 1o4TRHbEhdzA for ; Wed, 6 Jul 2016 22:21:10 +0000 (UTC) Received: from mail-io0-f171.google.com (mail-io0-f171.google.com [209.85.223.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A89175FB6F for ; Wed, 6 Jul 2016 22:21:09 +0000 (UTC) Received: by mail-io0-f171.google.com with SMTP id l202so5489175ioe.3 for ; Wed, 06 Jul 2016 15:21:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mikemccandless-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=OBnNXnuNYzr8pDohBJsuhxW4UhKaAaumdLwi4upJsFM=; b=mMMKYPCj18FevcnNgiDWbH8jwRFsDN/x8jKRL20zWW+iVHZCeMQT6W3xYABLqfm4Xo xHgXtBuakXDe5uOFELb8EgnrTitMWO19A5+SDG3P0QDarNZIVb1Tmy5DlNZ4AF81oc7Z /IV0g5++1csmK+FB9bdv2z1HhFyBbPgu7dKVHWwQIDDqpr+ArrNNW/cwnrcs6mH3KwQu Tlv2Y1YjH29HZ3BNXZy1ROkfvAvnI+9oU/gL0MRB90lEH/SVMkCxSacYX+3wlmg9eWOp FihgtSdANNvc5WykgRhQffhX8PuUK7RE+8xh6JGfie1oqB5+t3ElAQBfk70MVGNb5i5S Wa9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=OBnNXnuNYzr8pDohBJsuhxW4UhKaAaumdLwi4upJsFM=; b=Lx7GR8Nu3SLobhtqz+EkjtNO0ZrrbOc0GcRCd2h6dustdiIeig2piarWdigi8zfJln GC6I7BubdtvN3Z951WVfkPANZ4V+MLd54u+3hvdfCUhnwoZqWMTZoLniSOvH3cLPMKok 7QjUWY+DiPPAfLAnfCuji27RxhinKoUDBG3YpaYsZVrC1tdwnuidgqbBnnoOCrmJtS/8 tB40peQUWlO/8S0Vjy/4Unyf+MMZdvc5j6Ul8TahPXt2SeZU7CW0KIuT5OANThCRi9k2 W5ee9okFNYDRGPlurwiF3MWk8i9kJ/4xbWvjFiBsSr4wK0OcHIOllyzVgdh1AQuExYGh 1nuA== X-Gm-Message-State: ALyK8tJyRf/Qp84c1i1fIg1kVUMkbHaeJ6a+ImybqdGBQhpN8WtHgM1SARxanMEVBxaJkv/gpbYM26/EwHv/dg== X-Received: by 10.107.132.28 with SMTP id g28mr24132215iod.34.1467843668668; Wed, 06 Jul 2016 15:21:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.12.209 with HTTP; Wed, 6 Jul 2016 15:20:49 -0700 (PDT) In-Reply-To: References: From: Michael McCandless Date: Wed, 6 Jul 2016 18:20:49 -0400 Message-ID: Subject: Re: Lucene Block term Dictionary To: "Lucene/Solr dev" , msidana89@gmail.com Content-Type: multipart/alternative; boundary=001a113f377a750c990536fefcc6 archived-at: Wed, 06 Jul 2016 22:21:15 -0000 --001a113f377a750c990536fefcc6 Content-Type: text/plain; charset=UTF-8 The latest terms dictionary is "block tree", and unfortunately there are no guides here, besides of course the source code (BlockTreeTermsWriter/Reader). See especially the comments in those sources: they point to a paper describing the inspiration for this implementation. The high level view is that this terms dictionary breaks up the sorted terms into variable sized blocks (25 to 48 terms in each block) at "good" boundaries, where the term prefixes change, to maximize overall compression. The in-memory (JVM heap) FST terms index is used to find which on-disk block may have a given term, and so on lookup of a given term, we walk the FST, and then seek to that block and scan. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 12:04 PM, Mohit Sidana wrote: > Hello, > > I am interested to learn more about how Lucene uses block tree term > dictionary. > > while doing research on this topic i found some useful information listed > on below links. > > > 1. > http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html > 2. > http://blog.mikemccandless.com/2013/09/lucene-now-has-in-memory-terms.html > 3. http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal > > > I do understand that Lucene uses to store Prefixes of terms in to > memory and lookup terms/posting on disk but i am unable to visualize how > actual search working in Lucene 6.0. > > Please can someone suggest a guide which i can follow to understand all > step by step operation how actually a term search works with blockterms > dictionary? > > Thanks. > --001a113f377a750c990536fefcc6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The latest terms dictionary is "block tree", and= unfortunately there are no guides here, besides of course the source code = (BlockTreeTermsWriter/Reader).=C2=A0 See especially the comments in those s= ources: they point to a paper describing the inspiration for this implement= ation.

The high level view is that this terms dictionary= breaks up the sorted terms into variable sized blocks (25 to 48 terms in e= ach block) at "good" boundaries, where the term prefixes change, = to maximize overall compression.

The in-memory (JV= M heap) FST terms index is used to find which on-disk block may have a give= n term, and so on lookup of a given term, we walk the FST, and then seek to= that block and scan.

<= div dir=3D"ltr">

On Wed, Jul 6, 2016 at 12:04 PM, Mohit Sidan= a <msidana89@gmail.com> wrote:
Hello,

I am interested to learn = more about how Lucene uses block tree term dictionary.

=
while doing research on this topic i found some useful information lis= ted on below links.


3.=C2=A0http:/= /www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal


I do understand that Lucene uses <= ;FST> to store Prefixes of terms in to memory and lookup terms/posting o= n disk but i am unable to visualize how actual search working in Lucene 6.0= .

Please can someone suggest a guide which i can f= ollow to understand all step by step operation how actually a term search w= orks with blockterms dictionary?

Thanks.

--001a113f377a750c990536fefcc6--