lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Malgorzata Urbanska <urban...@cs.colostate.edu>
Subject Re: ngrams in Lucene 4.3.0
Date Tue, 16 Jul 2013 18:28:15 GMT
Hi,

I built Indexer with NGramAnalizer which uses ShingleFilter

Next I built Searcher with NGramQuery which uses BooleanQuery

String termToken = charTermAttribute.toString();
             Term t = new Term("content",termToken);
             add(new TermQuery(t),Occur.SHOULD);

it looks like everything works perfectly however my searcher do not
find any "hits"

I suspect my indexer code, so I tried to check index. But  Luke does
not work with Lucene 4.3.0 :(

Could someone give me hint what is happening?
Thanks,
gosia

On Mon, Jul 15, 2013 at 1:45 PM, Malgorzata Urbanska
<urbanska@cs.colostate.edu> wrote:
> thanks !!
>
>
>
> On Mon, Jul 15, 2013 at 1:31 PM, Ivan Krišto <ivan.kristo@gmail.com> wrote:
>> On 07/15/2013 07:50 PM, Malgorzata Urbanska wrote:
>>> Hi,
>>>
>>> I've been trying  to figure out how to use ngrams in Lucene 4.3.0
>>> I found some examples for earlier version but I'm still confused.
>>> How I understand it, I should:
>>> 1. create a new analyzer which uses ngrams
>>> 2. apply it to my indexer
>>> 3. search using the same analyzer
>>>
>>> I found in a documentation: NGramTokenFilter and NGramTokenizer, but I
>>> do not understand what is the difference between them.
>> This should be helpful:
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Tokenizers
>>
>> Here is example of n-gram analyzer:
>>
>> public class NGramAnalyzer extends Analyzer {
>>     @Override
>>     protected TokenStreamComponents createComponents(String fieldName,
>>             Reader reader) {
>>
>>         Tokenizer src = new NGramTokenizer(reader, 3, 3);
>>
>>         TokenStream tok = new StandardFilter(Version.LUCENE_43, src);
>>         tok = new LowerCaseFilter(Version.LUCENE_43, tok);
>>
>>         return new TokenStreamComponents(src, tok) {
>>             @Override
>>             protected void setReader(final Reader reader) throws
>> IOException {
>>                 super.setReader(reader);
>>             }
>>         };
>>     }
>> }
>>
>> If, for example, you want to remove stop words from document before
>> breaking it into n-grams, than you would need:
>> reader(document) -> SomeTokenizer -> StopFilter -> NGramTokenFilter
>>
>>
>>   Regards,
>>     Ivan Krišto
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> --
> Malgorzata Urbanska (Gosia)
> Graduate Assistant
> Colorado State University



-- 
Malgorzata Urbanska (Gosia)
Graduate Assistant
Colorado State University

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message