lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Finding out which field caused the search hit
Date Mon, 10 Sep 2007 09:14:47 GMT
>> the answer has never been satisfactory

Is this the original question?

What actually formed the basis of a document match is hidden in a tree of
heterogeneous Query objects and to be efficient their match output is
limited to document ids and scores-  not some detailed analysis of
which sections of a document matched. It is therefore hard/impossible
to have any highlighting solution which provides answers for all query
types and the existing Highlighter relies on a rough heuristic where QueryTermExtractor output
is used to find the list of query terms used and field TokenStreams are analyzed for content.

You mention Highlighter performance would be bad for wildcard queries. Have you tried it?
If it does turn out to be bad (many wildcard variants produced) might I suggest the following:

1) Dissect the unrewritten Query and find all WildcardQuery objects
2) Create a custom analyzer that re-implements the wildcard logic and produces a highlighter-friendly
token stream i,e
    Given a query of 
            Fred W*
    and data of
            Fred West was arrested
    the analyzer would produce:
            Fred [W*|West] was arrested
    ..where the tokens "W*" and "West" appear at the same position
3) Add a special wildcard term (W*) to the list of Query terms given to the Highlighter. This
would then match with the W* injected into the content in step 2)

This would avoid the overhead of picking through all the wildcard variants produced by the
wildcardQuery but at the cost of extra coding on your part and the runtime cost of re-executing
wildcard logic on all terms in the selected documents' TokenStreams. The difference in runtime
cost may prove minimal.


Yahoo! Answers - Got a question? Someone out there knows the answer. Try it

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message