> For steps 2 and 3 you shouldn't use FST at all. Instead, for 2) use
> BasicAutomata.makeString(String) on each of your expanded terms, then
> BasicOperations.union on all of those automata to make a single
How many input strings do you have? The API Mike mentioned in from a
port of the Brics library -- making separate automatons and then an
union will result in an attempt to minimize the result and this (when
the set of input strings is large) is a no-no in terms of memory (my
own experience).
I've added a method that creates an optimized automaton from a union
of Strings in one step, but I see this hasn't been ported to Lucene
yet.
http://www.brics.dk/automaton/doc/dk/brics/automaton/BasicAutomata.html#makeStringUnion(java.lang.CharSequence...)
If you could provide a patch that would port that code to Lucene it'd
be great (I guess it's trivial) and would speed up your step (1)
greatly.
Dawid
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|