lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Binary Automaton
Date Sat, 30 Sep 2017 18:58:36 GMT
>  Preface: I dont know how automaton is implemented deeply inside lucene ,

Well, you can take a look, it's open source. :) There are two
different finite state automata inside Lucene: one is pretty much a
"read-only" transducer from unique input seqences (of bytes) into an
output. This is the FST<?> class. The other is Automaton class which
has been ported from the Brics library [1].

I can't really relate to your comment about fast querying for
sub-automata; sounds interesting though. Dig in the code and suggest a
patch (or even demonstrate what you came up with!).



> but (considering automaton is built on the fly when index is already
> present) i imagine that the automaton   is scanning the lexicons/tokens
> present in the lucene index for finding the document references (solution
> 1).
> I think there are 2 different generic solutions for using automata for my
> opinion.
> 1) to create a automaton for parsing the token present in the lucene table
> as described above.
> 2) to create a pattern matching automaton(on binary, or better of a
> abstract stream could be  more generic) and put these states directly in a
> index . In this case you can receive very fastly the documents matching a
> specific automaton built when you created the index ( or a sub-automaton
>  rappreenting a subset of the same states) . The second solution could
> maybe be used for mapping inside a single lucene document field a complex
> structure  and then you can find nested information embedded . In this way
> i need not to use multiple lucene documents (this could create performance
> and scalability problems)
> In many cases this solution could be fastest of actual joins for example,
>  be usefull in bioinformatic or all those cases where data is not a basic
>  ADT.
> Cristian
> 2017-09-30 12:24 GMT+02:00 Dawid Weiss <>:
>> > Hi , it is possible to create a Automaton in lucene parsing not a string
>> > but a byte array?
>> Can you state what problem are you trying to solve? This seems to be a
>> question stripped of a more general context -- why do you need those
>> byte-based automata?
>> Dawid
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message