lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee Mallabone <>
Subject Re: HighLighting Service
Date Wed, 10 Apr 2002 08:46:00 GMT
On Tue, 2002-04-09 at 20:22, none none wrote:

> i am working on the Highlight terms functionality of Lucene.
> Some problem show up here:
> 1.It doesn't work with all the Query , e.g.: WidcardQuery,FuzzyQuery,PrefixQuery, PhraseQuery.

One thing I did was to modify LuceneTools, (well, I rewrote it
eventually) to output regular expressions instead of just terms. Then
use gnu.regexp or Jakarta ORO to match expressions against various forms
of the original documents. This allows you to do custom highlighting
(ie. highlight entire phrases not just the tokens in those phrases). It
also allows you to do wildcard matching with better speed if you
generate a single expression for the wildcard query, rather than
matching against every single term the wildcard query would match
individually. I didn't address FuzzyQuery or date queries.

> What we can do? any suggestion?

A method of generating document context is to store the body of your
document in the index. Then retrieve it, normalize any whitespace,
abbreviate the text at the first hit, and highlight the relevant terms
in the abbreviated text. This doesn't sound all that quick, but it
proved to be much quicker than consulting the original document in some
non-numerical tests I did.

That works really well for context extracts. However, it may or may not
be applicable to highlighting the entire document - it would depend on
the original format of your documents I think. I still consult the
original (HTML) documents for doing that, but all my documents are
fairly short.

> 3.I think we should incorporate this feature in Lucene, right now to make this
> working you should change some code in the Lucene package, so stay up
> to date require to change every time these part of code (if they are
> still there!!).Also because it strictly depend on the Lucene core
> package.

There are a whole bunch of different ways of implementing highlighting;
not all of them require changes to Lucene's core. I think integrating a
full highlight retrieval system into Lucene that's sufficiently generic
to match with Lucene's architecture might be difficult at best...

> I hope someone can help me giving some tips to make me able to complete this functionality.

I'm not 100% sure what you need to do further?

For what it's worth, if your current code is sufficient, I'd go with
that. I've refactored a few highlighting systems, and most of them end
up with quite a lot of code, depending on how detailed your spec is.


Lee Mallabone.

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message