lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: FST and FieldCache?
Date Thu, 19 May 2011 10:36:33 GMT
> I think, if we add ord as an output to the FST, then it builds
> everything we need?  Ie no further data structures should be needed?
> Maybe I'm confused :)

If you put the ord as an output the common part will be shifted towards the
front of the tree. This will work if you want to look up a given value
assigned to some string, but will not work if you need to look up the string
from its value. The latter case can be solved if you "know" which branch to
take while descending from root and the "shared prefix" alone won't give you
this information. At least I don't see how it could.

I am familiar with the basic prefix hashing procedure suggested by Daciuk
(and other authors), but maybe some progress has been made there, I don't
know... the one I know is really conceptually simple -- since each arc
encodes the number of leaves (or input sequences) in the automaton, you know
which path must lead you to your string. For example if you have a node like
this and seek for the 12-th term:

0 -- 10 -- ...
  +- 10 -- ...
  +- 5 -- ..

you look at the first path, it'd give you terms 1..10, then the next one
contains terms 11..20 so you add 10 to an internal counter which is added to
further computations, descend and repeat the procedure until you find a leaf
node.

Dawid

Mime
View raw message