lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Young <wyo...@streetdiligence.com>
Subject Getting Per Document Frequencies in Apache Lucenenet 4.8.0.0
Date Tue, 04 Apr 2017 21:52:40 GMT
I'm using this version of Lucenet: https://github.com/apache/lucenenet

I'm trying to get the number of phrase matches per document using a
PhraseQuery and an ExactPhraseScorer like so:

// Some phraseQuery defined here

using (IndexReader indexReader =
DirectoryReader.Open(IndexerJob.LuceneDirectory))
{
IndexSearcher indexSearcher = new IndexSearcher(indexReader);

TopDocs topDocs = indexSearcher.Search(masterQuery, _MAXSEARCHRESULTS);
var weight = phraseQuery.CreateWeight(indexSearcher);

var scorers = indexReader.Leaves.Select(o => weight.Scorer(o,
o.AtomicReader.LiveDocs)).Where(o => o != null);
foreach (var scorer in scorers)
{
while (scorer.NextDoc() != DocIdSetIterator.NO_MORE_DOCS)
{
int doc = scorer.DocID();
int freq = scorer.Freq();
Console.WriteLine("Document {0} contains {1} matches", doc, freq);
}
}
}

But when I call scorer.NextDoc(), it always returns
DocIdSetIterator.NO_MORE_DOCS, so the code in the while loop is never
executed. I tried this with a TermQuery instead of a PhraseQuery, and it
works fine. So the problem is with the implementation of PhraseQuery and
the ExactPhraseScorer.

I looked at the source code, and there seems to be a function in
ExactPhraseScorer:

private int PhraseFreq() { ... }

That is responsible for the calculation of the counts per document. Also
involved are the int[]'s Counts and Gens, but I don't really understand
what this is doing well enough to diagnose it.

Any ideas?

William

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message