lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: what is the format of .tim and .tiq in lucene 4.0 ?
Date Fri, 16 Nov 2012 12:07:52 GMT
The format is unfortunately rather intricate ...

FST = finite state transducer (see eg
).  We use that to hold the terms index (*.tip), which is loaded into

The blocks are because we encode a block of between 25 - 48 terms
together.  Blocks are picked according to how terms share prefixes so
that we get better compression and faster loookup.  It's a variant of
a burst trie (see eg ).

The index points to the start of blocks, so in looking up a term we
figure out from the index which block may have the term (if any), seek
there, and scan for it.

Mike McCandless

On Fri, Nov 16, 2012 at 3:57 AM, wgggfiy <> wrote:
> Hi, guys.I'm now studying lucene 4.0, and come into difficulties.Compared
> previous version, the term dictionary is not like this version.what is block
> ? and what is the FST ?help me, thx.
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message