lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <soko...@ifactory.com>
Subject Re: Extracting span terms using WeightedSpanTermExtractor
Date Thu, 07 Jul 2011 00:28:54 GMT
I tried something similar, and failed - I think the API is lacking 
there?  My only advice is to vote for this: 
https://issues.apache.org/jira/browse/LUCENE-2878 which should provide 
an alternative better API, but it's not near completion.

-Mike

On 7/6/2011 5:34 PM, Jahangir Anwari wrote:
> I have a CustomHighlighter that extends the SolrHighlighter and overrides
> the doHighlighting() method. Then for each document I am trying to extract
> the span terms so that later I can use it to get the span Positions. I tried
> to get the weightedSpanTerms using WeightedSpanTermExtractor but was
> unsuccessful. Below is the code that I am have. Is there something missing
> that needs to be added to get the span terms?
>
> // in CustomHighlighter.java
> @Override
> public NamedList doHighlighting(DocList docs, Query query, SolrQueryRequest
> req, String[] defaultFields) throws IOException {
>
>    NamedList highlightedSnippets = super.doHighlighting(docs, query, req,
> defaultFields);
>
>    IndexReader reader = req.getSearcher().getIndexReader();
>
>    String[] fieldNames = getHighlightFields(query, req, defaultFields);
>    for (String fieldName : fieldNames)
>    {
>    QueryScorer scorer = new QueryScorer(query, null);
>    scorer.setExpandMultiTermQuery(true);
>    scorer.setMaxDocCharsToAnalyze(51200);
>
>    DocIterator iterator = docs.iterator();
>    for (int i = 0; i<  docs.size(); i++)
>    {
> int docId = iterator.nextDoc();
> System.out.println("DocId: " + docId);
>   TokenStream tokenStream = TokenSources.getTokenStream(reader, docId,
> fieldName);
>   WeightedSpanTermExtractor wste = new WeightedSpanTermExtractor(fieldName);
> wste.setExpandMultiTermQuery(true);
> wste.setWrapIfNotCachingTokenFilter(true);
>
> Map<String,WeightedSpanTerm>  weightedSpanTerms  =
> wste.getWeightedSpanTerms(query, tokenStream, fieldName); // this is always
> empty
> System.out.println("weightedSpanTerms: " + weightedSpanTerms.values());
>
>    }
>    }
>       return highlightedSnippets;
>
> }
>
> Thanks,
> Jahangir
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message