lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tomas.kalas" <kala...@email.cz>
Subject Tokenizer or Filter ?
Date Fri, 09 Jan 2015 11:47:30 GMT
Hello, i have a question what i have to use tokenizer or filter ?
I need separate 2 chanels. I wrote this here earlier, but realize it with
solr basic tools it is not probably possible. And i',m trying to write own
tool for this task.
I have this input <d1>Hello</d1><d2>Hello</d2><d1>How are you
?</d1><d2>Fine
and you're?</d2> ....
d1 - direction1
d2 - direction2
and i want to output only d1 and between this result search some words, for
example output should be:
Output: [<d1>Hello</d1>,<d1>How are you?</d1><d1>....</d1>....]


I wrote my idea in java, but i dont know where  to incorporate it. If to
Filter or Tokenizer and some advices how to start? I probably must extends
some lucene library and include it easily modificated there isn't it ?

Here is my code:

package test1;
import java.util.Arrays;

public class Test1 {


    public static void main(String[] args) {
        String dialogue = "<d1>Hello</d1><d2>Hello</d2><d1>How
are you
?</d1><d2>Fine and you're?</d2> ....";

        String[] input = dialogue.split("(?<=</d[12]>)\\d*(?=<d[12]>)");

        int countD1 = 0;

        for (String input1 : input) {
            if (input1.startsWith("<d1>")) {
                countD1++;
            }
        }
        String [] d1 = new String[countD1];
        int array = 0;
        
        for (String input1 : input) {
            if (input1.startsWith("<d1>")) {
                d1[array] = input1;
                array++;
            }
        }
        String d1Out = Arrays.toString(d1);
        System.out.println(d1Out); 
//Return s1Out
         }
}

Thanks for you advices. 



--
View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message