lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jahangir Anwari <jah...@gmail.com>
Subject Extracting span terms using WeightedSpanTermExtractor
Date Wed, 06 Jul 2011 21:34:42 GMT
I have a CustomHighlighter that extends the SolrHighlighter and overrides
the doHighlighting() method. Then for each document I am trying to extract
the span terms so that later I can use it to get the span Positions. I tried
to get the weightedSpanTerms using WeightedSpanTermExtractor but was
unsuccessful. Below is the code that I am have. Is there something missing
that needs to be added to get the span terms?

// in CustomHighlighter.java
@Override
public NamedList doHighlighting(DocList docs, Query query, SolrQueryRequest
req, String[] defaultFields) throws IOException {

  NamedList highlightedSnippets = super.doHighlighting(docs, query, req,
defaultFields);

  IndexReader reader = req.getSearcher().getIndexReader();

  String[] fieldNames = getHighlightFields(query, req, defaultFields);
  for (String fieldName : fieldNames)
  {
  QueryScorer scorer = new QueryScorer(query, null);
  scorer.setExpandMultiTermQuery(true);
  scorer.setMaxDocCharsToAnalyze(51200);

  DocIterator iterator = docs.iterator();
  for (int i = 0; i < docs.size(); i++)
  {
int docId = iterator.nextDoc();
System.out.println("DocId: " + docId);
 TokenStream tokenStream = TokenSources.getTokenStream(reader, docId,
fieldName);
 WeightedSpanTermExtractor wste = new WeightedSpanTermExtractor(fieldName);
wste.setExpandMultiTermQuery(true);
wste.setWrapIfNotCachingTokenFilter(true);

Map<String,WeightedSpanTerm> weightedSpanTerms  =
wste.getWeightedSpanTerms(query, tokenStream, fieldName); // this is always
empty
System.out.println("weightedSpanTerms: " + weightedSpanTerms.values());

  }
  }
     return highlightedSnippets;

}

Thanks,
Jahangir

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message