lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: FST and FieldCache?
Date Thu, 19 May 2011 10:16:49 GMT
> We should add this (lookup by value, when value is guaranteed to
> monotonically increase as the key increases) to our core FST APIs?
> It's generically useful in many places ;)  I'll open an issue.

The data structure itself should sort of "build itself" if you create
an FST with increasing integers because the "shared suffix" should be
pushed towards the root anyway, so the only thing would be to correct
values on all outgoing arcs (they need to contain the count of leaves
on the subtree).... but then, this may be tricky if arc values are
vcoded... I'd have to think how to do this.

> While FST w/ lookup-by-monotonic-value would work here, I would be
> worried about the per hit of that representation vs what

There are actually two things:

a) performance; you need to descend in the automaton and some
bookkeeping to maintain the count of nodes; this adds overhead,

b) size; the procedure for storing/ calculating perfect hashes I
described requires leaf counts on each arc and these are usually large
integers. Even vcoded they bloat the resulting data structure.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message