lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hariram ravichandran <hariramravichan...@gmail.com>
Subject Tokens produced by Shingle filter are not added in the query
Date Mon, 24 Jul 2017 14:53:42 GMT
I'm using Lucene 4.10.4 and trying to construct (shingles) combinations of
tokens.


Code:

public class CustomAnalyzer extends Analyzer {
    @Override
    protected Analyzer.TokenStreamComponents createComponents(final String
fieldName, final Reader reader) {
        final WhitespaceTokenizer src = new
WhitespaceTokenizer(getVersion(), reader);
        TokenStream tok = new ShingleFilter(src, 2, 3);
        tok = new ClassicFilter(tok);
        tok = new LowerCaseFilter(tok);
//        tok = new
SynonymFilter(tok,SynonymDictionary.getSynonymMap(),true);
        return new Analyzer.TokenStreamComponents(src, tok);
    }
}

public class Test {
    public static void main(String[] args) throws Exception {
        CustomSynonymAnalyzer analyzer = new CustomSynonymAnalyzer();
        String queryStr = "cup board";
        TokenStream ts = new CustomAnalyzer().tokenStream("n", new
StringReader(queryStr));
        ts.reset();
        System.out.println("Tokens are :");
        while (ts.incrementToken()) {
            System.out.print(ts.getAttribute(CharTermAttribute.class) + ",
");
        }
        QueryParser parser = new QueryParser("n", analyzer);
        Query query = null;
        query = parser.parse(queryStr);
        System.out.println("\nQuery is");
        System.out.print(query.toString());
    }
}



> Output:
> Tokens are :
> cup, cup board, board
> Query is n
> n:cup n:board
>

Tokens are printed as expected. And expecting the resulting query to be *n:cup
n:board n:cup board*. But tokens formed by shingle filter are not appended
to the query. I get only *n:cup n:board.* Where is my mistake?

Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message