lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjula Wijewickrema <>
Subject Why hit is 0 for bigrams?
Date Tue, 08 Jul 2014 04:30:57 GMT

I tried to index bigrams from a documhe system gave and the system gave me
the following output with the frequencies of the bigrams(output 1):

array size:15
array terms are:{contents: /1, assist librarian/1, assist manjula/2, assist
sabaragamuwa/1, fine manjula/1, librari manjula/1, librarian
sabaragamuwa/1, main librari/2, manjula assist/4, manjula fine/1, manjula
name/1, name manjula/1, sabaragamuwa univers/3, univers main/2, univers

For this I used the follwing code in the createIndex() class:

ShingleAnalyzerWrapper sw=*new *ShingleAnalyzerWrapper(analyzer,2);


Then I tried search the indexed bigrams of the same document using the
following code in searchIndex()class:

IndexReader indexReader =;

IndexSearcher indexSearcher = *new* IndexSearcher(indexReader);

Analyzer analyzer = *new* WhitespaceAnalyzer();

QueryParser queryParser = *new* QueryParser(*FIELD_CONTENTS*, analyzer);

Query query = queryParser.parse(terms[pos[freqs.length-q1]]);

System.*out*.println("Query: " +query);

Hits hits =;

System.*out*.println("Number of hits: " + hits.length());

For this, the system gave me the following output (output2):

Query: contents:manjula contents:assist

Number of hits: 0

Query: contents:sabaragamuwa contents:univers

Number of hits: 0

Query: contents:univers contents:main

Number of hits: 0

Query: contents:main contents:librari

Number of hits: 0

If someone can please explain me;

(1)why 'contents: /1' is included in the array as an array element? (output

(2) why the system return me the query as 'contents:manjula
contents:assist' instead of 'manjula assist'? (output 2)

(3) why the number of hits given as 0 instead of their frequencies? (output

I highly appreciate your kind reply.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message