lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Popov <valentin...@gmail.com>
Subject Automaton -> SpanMultiTermQueryWrapper with lucene 4.10.2
Date Wed, 24 Dec 2014 13:59:15 GMT
Hello, everyone. 

I have maybe not standard question, how to use together Automaton and SpanMultiTermQueryWrapper.


Main idea of this approach is solve some problem with search. I need search combination of
numbers on text. Combination is any numbers [0-9]{3}-[0-9]{3}-[0-9]{3}, exclude 111 in any
position. 

So DFA for this will be:


Automaton full = new RegExp("[0-9]{3}.[0-9]{3}.[0-9]{3}").toAutomaton();
Automaton exclude1 = new RegExp("111.[0-9]{3}.[0-9]{3}").toAutomaton();
Automaton exclude2 = new RegExp("[0-9]{3}.111.[0-9]{3}").toAutomaton();
Automaton exclude3 = new RegExp("[0-9]{3}.[0-9]{3}.111").toAutomaton();
	    
full = Operations.minus(full, exclude1);
full = Operations.minus(full, exclude2);
full = Operations.minus(full, exclude3);
full = MinimizationOperations.minimize(full);

Query query = new AutomatonQuery(new Term("body"), full);

This query working fine with single term like 123-456-789, but we are using StandardAnalyzer
for body, so it will be separated to 3 Terms {123, 456, 789} and search will not work. 

I want use SpanMultiTermQueryWrapper to make search terms with order using automata rules.


Any clue? 

Thanks


PS. Operations.subsetOf(Automata.makeString("123-456-789"), full); => true; Operations.subsetOf(Automata.makeString("111-456-789"),
full); => false. TestCase for Automata works fine. 

Regards,
Valentin Popov





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message