lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] Identifying relevant field in $hits
Date Tue, 04 Oct 2011 03:26:00 GMT
On Mon, Oct 03, 2011 at 04:06:08PM +0200, goran kent wrote:
> Hi,
> 
> I've scrounged around a bit, and I take it
> http://mail-archives.apache.org/mod_mbox/incubator-lucy-dev/201109.mbox/%3C4E6C3489.6090005@peknet.com%3E
> is the only way to identify which field triggered a $hit, right?

I don't know of a better way.

> ie (roughly), flag all fields as highlightable, then if the
> Lucy::Highlight::Highlighter actually highlights something in a field,
> then that's your indication that something was found in it?
 
Yes.

> If so, feature request:  my @field_hit = $hits->relevant_field() would
> be really nice ;)

I don't think this feature is mature enough to be given a prominent core API
just yet.  We've struck upon the general approach of post-processing the hit
using the single-document mini-inverted-indexes needed for highlighting, but
the current implementation is arguably an abuse of Highlighter.

For now, I think it's OK that we support this feature with cookbook code or
via convenience methods in libraries which wrap Lucy.  But it's become a
popular feature request, and so it's good to think about what a Lucy API might
look like in the future.

Peter has provided one vision, in SWISH::Prog::Lucy::Results.  I confess that
I don't quite understand what you've shown us above.  Can you provide some
context illustrating how it would be used?

> My minor problem is:  I have inbound link text pointing to a page
> which is indexed along with the page content itself.  Since it's never
> displayed, you might have a hit on this 'hidden' text (but highly
> relevant in my case) and no other hits, so the excerpt is void of any
> highlighting (I can just hear the wailing and gnashing of teeth from
> my future users).  It would be nice to be able to flag this kind of
> search result as "Found your term in inbound text" or whatever).

OK, sure.  You can abuse Highlighter to achieve your ends.  :)

I assume that you have stripped all HTML tags from your data.  (They would
likely mess up scoring if left in).  Thus seeing if a highlighted exerpt
contains a "<strong>" tag suffices to indicate that a field indeed matched.

If the primary content field produces a excerpt that does *not* contain
"<strong>", but the "inbound_text" excerpt *does* contain "<strong>", then you
know to flag that particular result.

Cheers,

Marvin Humphrey


Mime
View raw message