lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Haye ...@snyder-haye.com>
Subject Re: multi-field highlighting
Date Fri, 06 May 2005 19:38:36 GMT
As part of my work on XTF for the California Digital Library, I've written such a highlighter.
You can see it in action here:

	http://texts.cdlib.org/escholarship/

It supports multi-field highlighting, and ranks the matches within a document field. It highlights
the extent of the actual hits, as well as the terms within a hit (click on a text hit to see
this highlighting). I think that's what Doug means by "phrasal" matching.

Unfortunately, it involves significant additions to the Lucene core. In essence it relies
on an amped-up span system that is capable of scoring the spans, as well as recording which
spans matched for each document field.

This is the second rev of the code, and was designed to be contributed to back into Lucene.
It's already apache licensed, and pretty well documented. I also tried to ensure zero speed
impact for queries that don't need span recording. Here's the project page: http://sourceforge.net/projects/xtf


A few weeks ago I joined the Lucene dev mailing list, and I've been trying to get the lay
of the land before I suggest changes to the Lucene core. Okay, that's only partly true. Actually,
I've never contributed to a project like this before, and have been trying to work up the
courage.

The code is based on 1.4.3; if people are interested, I'll work on a patch to the current
svn trunk. I'll also have to port our test suite over to junit.

--Martin


On Fri, 06 May 2005 12:04:25 -0700, Doug Cutting wrote:
> There's a post over at SearchEngineWatch theorizing about how
> Google produces summaries.
> 
> http://forums.searchenginewatch.com/showthread.php?threadid=5448
> 
> Lucene's current highlighter doesn't easily support multi-fields,
> nor does it take phrasal matching into account.  It might be useful
> to have a highligher API that takes a Document and summarizes all
> of its fields, incorporating their boosts in fragment scores.  
> Thoughts?
> 
> Doug
> 
> 
> --------------------------------------------------------------------
> - To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message