lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajen Chatterjee <rajen.k.chatter...@gmail.com>
Subject How to make word-N-gram based query and interpolate each N-gram score to obtain final Lucene score
Date Mon, 11 Jan 2016 08:43:42 GMT
Hello Everyone,

I am looking for some method which can help me to build *word-N-gram* based
queries.
After doing some search I think that I have to define an analyzer as
follows:

public static Analyzer wordNgramAnalyzer(final int minShingle, final int
maxShingle) {
        return new Analyzer() {
            @Override
            public TokenStream tokenStream(String fieldName, Reader reader)
{
               return new ShingleFilter(new WhitespaceTokenizer(reader),
minShingle, maxShingle)
            }
        };
    }
This analyzer will help to get unigram, bigram, trigram,... tokens, which I
can use during indexing as well as at the query time.
So, can anyone please tell me:
1) Is this the right approach to index and query word-N-gram?
2) Is there any way to set weights to the N-grams, like at the query time
tri-gram based tokens should have higher weight than an uni-gram based token
(something like the final lucene score should be interpolation of uni-gram
score, bi-gram score, tri-gram score,... and so on)

Any help is much appreciated.

Thanks

-- 
-Regards,
 Rajen Chatterjee.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message