lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tomas.kalas" <>
Subject Tokenizer or Filter ?
Date Fri, 09 Jan 2015 11:47:30 GMT
Hello, i have a question what i have to use tokenizer or filter ?
I need separate 2 chanels. I wrote this here earlier, but realize it with
solr basic tools it is not probably possible. And i',m trying to write own
tool for this task.
I have this input <d1>Hello</d1><d2>Hello</d2><d1>How are you
and you're?</d2> ....
d1 - direction1
d2 - direction2
and i want to output only d1 and between this result search some words, for
example output should be:
Output: [<d1>Hello</d1>,<d1>How are you?</d1><d1>....</d1>....]

I wrote my idea in java, but i dont know where  to incorporate it. If to
Filter or Tokenizer and some advices how to start? I probably must extends
some lucene library and include it easily modificated there isn't it ?

Here is my code:

package test1;
import java.util.Arrays;

public class Test1 {

    public static void main(String[] args) {
        String dialogue = "<d1>Hello</d1><d2>Hello</d2><d1>How
are you
?</d1><d2>Fine and you're?</d2> ....";

        String[] input = dialogue.split("(?<=</d[12]>)\\d*(?=<d[12]>)");

        int countD1 = 0;

        for (String input1 : input) {
            if (input1.startsWith("<d1>")) {
        String [] d1 = new String[countD1];
        int array = 0;
        for (String input1 : input) {
            if (input1.startsWith("<d1>")) {
                d1[array] = input1;
        String d1Out = Arrays.toString(d1);
//Return s1Out

Thanks for you advices. 

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message