lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: FST Builder pruning
Date Tue, 19 Nov 2013 12:14:14 GMT
I think BlockTree should be better (less disk space, RAM and faster
lookups), but if you can make a benchmark comparing the two that would
help confirm/deny!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 18, 2013 at 9:21 PM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> Many thanks Mike.
>
> In a given document, I usually have 15-20 fields out of which 6-7 fields
> are plain key-value fields.
>
> Typically these key-value fields don't involve prefixes, reflexes, fuzzies
> etc...It's always a full match. Non-existent values are also not possible
> during search.
>
> In such a case, will it be better for me to use the old TII/TIS formats or
> the BlockTree format will be much better for K-V access also?
>
> --
> Ravi
>
> On Monday, November 18, 2013, Michael McCandless wrote:
>
>> Yes, BlockTreeTermsWriter uses freezeTail to figure out where to draw
>> the lines for assigning terms to blocks, but to build the trie terms
>> index it builds a separate FST, by adding in each block's prefix (it
>> doesn't use the FST's builder pruning to create the trie).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 15, 2013 at 1:33 PM, Ravikumar Govindarajan
>> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
>> > Yeah, now I kind of understood.
>> >
>> > Is this why BlockTreeTermsWriter plugs in it's freezeTail logic of
>> meeting
>> > min-nbr of terms per block and building a trie for locating sub-blocks?
>> >
>> > --
>> > Ravi
>> >
>> >
>> > On Fri, Nov 15, 2013 at 11:17 PM, Michael McCandless <
>> > lucene@mikemccandless.com <javascript:;>> wrote:
>> >
>> >> When you turn on pruning, FST Builder will just remove nodes that
>> >> don't have a high enough count of input terms traversing through them.
>> >>  E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100
>> >> input terms coming through them, are preserved.
>> >>
>> >> You can use this to build a prefix trie instead of the full FST.
>> >>
>> >> Creating a custom tail freezer is very expert: it lets you implement
>> >> arbitrary logic on which nodes are pruned or not.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Fri, Nov 15, 2013 at 12:16 PM, Ravikumar Govindarajan
>> >> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
>> >> > I was trying to understand some logic in Builder class of FST.
>> >> >
>> >> > The method freezeTail() looks quite hairy. I gather that there is an
>> some
>> >> > logic for pruning a node or compiling it.
>> >> >
>> >> > What exactly is pruning a node? An example of it will be really really
>> >> > helpful
>> >> >
>> >> > --
>> >> > Ravi
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
>> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message