lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: FST and FieldCache?
Date Thu, 19 May 2011 10:22:11 GMT
On Thu, May 19, 2011 at 6:16 AM, Dawid Weiss
<dawid.weiss@cs.put.poznan.pl> wrote:
>> We should add this (lookup by value, when value is guaranteed to
>> monotonically increase as the key increases) to our core FST APIs?
>> It's generically useful in many places ;)  I'll open an issue.
>
> The data structure itself should sort of "build itself" if you create
> an FST with increasing integers because the "shared suffix" should be
> pushed towards the root anyway, so the only thing would be to correct
> values on all outgoing arcs (they need to contain the count of leaves
> on the subtree).... but then, this may be tricky if arc values are
> vcoded... I'd have to think how to do this.

I think, if we add ord as an output to the FST, then it builds
everything we need?  Ie no further data structures should be needed?
Maybe I'm confused :)

>> While FST w/ lookup-by-monotonic-value would work here, I would be
>> worried about the per hit of that representation vs what
>
> There are actually two things:
>
> a) performance; you need to descend in the automaton and some
> bookkeeping to maintain the count of nodes; this adds overhead,
>
> b) size; the procedure for storing/ calculating perfect hashes I
> described requires leaf counts on each arc and these are usually large
> integers. Even vcoded they bloat the resulting data structure.

Maybe we should iterate on the issue to get down to the specifics?  I
had thought there wouldn't be any backtracking, if the FST had stored
the ord as an output...

Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message