lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Getting matched words for PhraseQuery or SpanNearQuery
Date Tue, 28 Apr 2009 12:21:40 GMT
The Span Highlighter gets positions by attempting to convert a standard 
Lucne Query to a SpanQuery approximate, and then calling getSpans on the 
span query to find start end positions (getSpans is called against a 
fast single document MemoryIndex). You might check out 
WeightedSpanTermExtractor in the Highlighter package. It may be a bit 
hard to navigate for a new user though.

- Mark

Jaco wrote:
> Hello,
> I am pretty new to the Lucene API, and there's something I can't figure out
> from the docs and from the mailing list archives. I hope somebody can point
> me into the right direction. Here's my case: for text analysis purposes I am
> doing PhraseQueries and SpanNearQueries. Using the highlighter, I can
> extract text snippets with matching words marked.
> What I really am looking for is to extract information on each match to the
> query, if possible including position information in the text. For example,
> if the text I am searching in is [a b c a d e f a b], and my query is [a b],
> then I want to know where the words [a b] were matched together in the text
> due to the use of the PhraseQuery/SpanNearQuery ([a b] will get me two
> occurrences in the documents text).
> As far as I can find out, the highlighter is capable of marking the
> individual words causing the hit, but it can't show me which words together
> form one 'hit' to the search text. Is there a way to do this with the Lucene
> API? Any help would be appreciated!
> Thanks in advance, bye,
> Jaco.
> PS this is a follow up for this thread in the Solr user mailing list:

- Mark

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message