lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Brics Automaton version
Date Mon, 21 Jun 2010 23:26:50 GMT
On Mon, Jun 21, 2010 at 3:16 PM, eks dev <eksdev@yahoo.co.uk> wrote:

> i would even argue it makes sense to keep some (all?) of these methods,
> especially if intended use of the Automaton code gets expanded to Analyzer
> chains. This particular method has usage in our code for optimizing matching
> based on minimum possible length that can get accepted.
>
>
by the way, I think your use case here is a perfect example where 'dropping
in' the additional code wouldn't necessarily lead to expected behavior, due
to the differences i mentioned.

I'm gonna assume really quick (please correct me if I am wrong!) that you
want to use getShortestExample so you know the minimal length of char[] that
can possibly match a DFA. With this information you could simply test
TermAttribute.getTermLength() and fail fast if the term is too short,
without even invoking .run()

but this wouldn't be quite the same with org.apache.lucene.util.automaton,
since all transitions are not code units (char) but codepoints (int). not
saying your optimization wouldn't still work, but from a more general and
practical perspective, its definitely 'different'.

-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message