lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: Tokens produced by Shingle filter are not added in the query
Date Mon, 24 Jul 2017 18:22:23 GMT
hariram,

Until Lucene 6.2, there was no way for the classic query parser to *not* first split on whitespace
before sending text to the analyzer.  As a result, filters like ShingleFilter that operate
on multiple tokens will only see one token at a time; in your example: first “cup” as
the full text to analyze, and then, separately, “board” - ShingleFilter is incapable under
those conditions of forming any multi-token synthetic tokens.

For more details see <https://issues.apache.org/jira/browse/LUCENE-2605>.

--
Steve
www.lucidworks.com

> On Jul 24, 2017, at 2:00 PM, hariram ravichandran <hariramravichandar@gmail.com>
wrote:
> 
> Hi Steve,
>    I'm sorry. That's also CustomAnalyzer.
> 
> public class CustomAnalyzer extends Analyzer {
>>    @Override
>>    protected Analyzer.TokenStreamComponents createComponents(final String
>> fieldName, final Reader reader) {
>>        final WhitespaceTokenizer src = new WhitespaceTokenizer(getVersion(),
>> reader);
>>        TokenStream tok = new ShingleFilter(src, 2, 3);
>>        tok = new ClassicFilter(tok);
>>        tok = new LowerCaseFilter(tok);
>> //        tok = new SynonymFilter(tok,SynonymDictionary.
>> getSynonymMap(),true);
>>        return new Analyzer.TokenStreamComponents(src, tok);
>>    }
>> }
>> 
>> 
> public class Test {
>>    public static void main(String[] args) throws Exception {
>>        CustomAnalyzer analyzer = new CustomAnalyzer();
>>        String queryStr = "cup board";
>>        TokenStream ts = new CustomAnalyzer().tokenStream("n", new
>> StringReader(queryStr));
>>        ts.reset();
>>        System.out.println("Tokens are :");
>>        while (ts.incrementToken()) {
>>            System.out.print(ts.getAttribute(CharTermAttribute.class) +
>> ", ");
>>        }
>>        QueryParser parser = new QueryParser("n", analyzer);
>>        Query query = null;
>>        query = parser.parse(queryStr);
>>        System.out.println("\nQuery is");
>>        System.out.print(query.toString());
>>    }
>> }
> 
> 
> Output:
>> Tokens are :
>> cup, cup board, board
>> Query is n
>> n:cup n:board
>> 
> 
> 
> On Mon, Jul 24, 2017 at 11:08 PM, Steve Rowe <sarowe@gmail.com> wrote:
> 
>> Hi hariram,
>> 
>> There may be other problems, but at a minimum you have two different
>> analysis classes here.  You’re printing the output stream from one
>> (CustomSynynymAnalyzer, the source of which is not shown in your email),
>> but constructing a query from a different one (CustomAnalyzer).
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jul 24, 2017, at 10:53 AM, hariram ravichandran <
>> hariramravichandar@gmail.com> wrote:
>>> 
>>> I'm using Lucene 4.10.4 and trying to construct (shingles) combinations
>> of
>>> tokens.
>>> 
>>> 
>>> Code:
>>> 
>>> public class CustomAnalyzer extends Analyzer {
>>>   @Override
>>>   protected Analyzer.TokenStreamComponents createComponents(final String
>>> fieldName, final Reader reader) {
>>>       final WhitespaceTokenizer src = new
>>> WhitespaceTokenizer(getVersion(), reader);
>>>       TokenStream tok = new ShingleFilter(src, 2, 3);
>>>       tok = new ClassicFilter(tok);
>>>       tok = new LowerCaseFilter(tok);
>>> //        tok = new
>>> SynonymFilter(tok,SynonymDictionary.getSynonymMap(),true);
>>>       return new Analyzer.TokenStreamComponents(src, tok);
>>>   }
>>> }
>>> 
>>> public class Test {
>>>   public static void main(String[] args) throws Exception {
>>>       CustomSynonymAnalyzer analyzer = new CustomSynonymAnalyzer();
>>>       String queryStr = "cup board";
>>>       TokenStream ts = new CustomAnalyzer().tokenStream("n", new
>>> StringReader(queryStr));
>>>       ts.reset();
>>>       System.out.println("Tokens are :");
>>>       while (ts.incrementToken()) {
>>>           System.out.print(ts.getAttribute(CharTermAttribute.class) +
>> ",
>>> ");
>>>       }
>>>       QueryParser parser = new QueryParser("n", analyzer);
>>>       Query query = null;
>>>       query = parser.parse(queryStr);
>>>       System.out.println("\nQuery is");
>>>       System.out.print(query.toString());
>>>   }
>>> }
>>> 
>>> 
>>> 
>>>> Output:
>>>> Tokens are :
>>>> cup, cup board, board
>>>> Query is n
>>>> n:cup n:board
>>>> 
>>> 
>>> Tokens are printed as expected. And expecting the resulting query to be
>> *n:cup
>>> n:board n:cup board*. But tokens formed by shingle filter are not
>> appended
>>> to the query. I get only *n:cup n:board.* Where is my mistake?
>>> 
>>> Thanks.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message