lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <...@il.ibm.com>
Subject Re: Get list with found words for a hit?
Date Mon, 27 Feb 2006 11:56:53 GMT
"Samuru Jackson" <samurujackson@googlemail.com> wrote on 27/02/2006
01:50:11 PM:
> Is there a way to retrieve a List of the matching words for a Hit?
> For example I create a query like this one:
> "Paris London -Stockholm"
> ...
> How do I know which words have been found in a document? In one it could
be
> Paris, in another it could be London or both!
> I would need this information in order to highlight those words if I
display
> the search results to the user.

For the purpose of highlighting, you don't necessarily need to know in
advance
which word matched: you can just highlight any occurance of either Paris or
London - wherever you find them - in the original text.

You might want to take a look at the Highlighter class in the contrib
directory
of Lucene's distribution, which might do what you want. Here is some
example
code: it creates a Highlighter object for highlighting the given query "q",
and then for each of the results, it retrieves the full content of the
document from the stored "storeadContent" field which I added to the index,
and finds the 2 most relevant sentences in the content and highlights q's
words (this is similar to the summaries you see in Google and its likes):

      Highlighter highlighter = new Highlighter(new QueryScorer(q));

highlighter.setMaxDocBytesToAnalyze(ArbitraryLimits.DocumentToSaveCutOff);

      for(... i iterates over the relevant hits...){
            Document doc = hits.doc(i);
            TokenStream tokenStream = analyzer.tokenStream("storedContent",
                        new StringReader(doc.get("storedContent")));
            summary = highlighter.getBestFragments(tokenStream,
                              doc.get("storedContent"), 2, " ... ");
      }


--
Nadav Har'El


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message