lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: FST and FieldCache?
Date Thu, 19 May 2011 10:22:11 GMT
On Thu, May 19, 2011 at 6:16 AM, Dawid Weiss
<> wrote:
>> We should add this (lookup by value, when value is guaranteed to
>> monotonically increase as the key increases) to our core FST APIs?
>> It's generically useful in many places ;)  I'll open an issue.
> The data structure itself should sort of "build itself" if you create
> an FST with increasing integers because the "shared suffix" should be
> pushed towards the root anyway, so the only thing would be to correct
> values on all outgoing arcs (they need to contain the count of leaves
> on the subtree).... but then, this may be tricky if arc values are
> vcoded... I'd have to think how to do this.

I think, if we add ord as an output to the FST, then it builds
everything we need?  Ie no further data structures should be needed?
Maybe I'm confused :)

>> While FST w/ lookup-by-monotonic-value would work here, I would be
>> worried about the per hit of that representation vs what
> There are actually two things:
> a) performance; you need to descend in the automaton and some
> bookkeeping to maintain the count of nodes; this adds overhead,
> b) size; the procedure for storing/ calculating perfect hashes I
> described requires leaf counts on each arc and these are usually large
> integers. Even vcoded they bloat the resulting data structure.

Maybe we should iterate on the issue to get down to the specifics?  I
had thought there wouldn't be any backtracking, if the FST had stored
the ord as an output...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message