lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From apgw <>
Subject Re: Lucene rich-text search with returned hyperlinks
Date Mon, 11 May 2009 16:49:49 GMT

The documents are text fields in a db of legal docs, so the search is not for
the document but for the search string(s) (there will be multiple) within a
given document. The search strings are manually derived from the main part,
and they would like to match these in the law's various sub-sections
automatically (legal docs - long, tedious...).

I have more detailed questions (?should they be indexed when saved, or if
this can be done quickly enough when the page is requested), and so on, but
this is probably not the right forum; just need to know if Lucene will do
it. I see there is a new edition of the 'Lucene in Action' almost ready; I
have the first ed coming in the mail which I hope will help.

Ted Dunning wrote:
> Yes.  This can be done using Lucene.
> But, this is subject to a few liberal interpretations of what you asked
> for.  To wit, I am assuming that you want to find interesting documents
> from
> a bunch of documents, not just search a single document for matches.
> The span queries that another poster mentioned would be good as would
> sloppy
> phrase queries.
> Depending on which European languages you need to handle, there may be
> some
> work you need to do to deal with morphological analysis.  Lucene has
> reasonable support for English and somewhat more rudimentary support for a
> few other European languages.  Support for Asian languages is very basic
> at
> best.
> On Sun, May 10, 2009 at 7:43 PM, apgw <>
> wrote:
>> I am new to Lucene. Is this the right utility to use for the following
>> use
>> case:
>> 1) Find a search term - eg. 'lithium battery' in some technical rich-text
>> data (can be in any european language), 4K - 64K size, and return the
>> exact
>> position in the text so that the occurrence can be turned into a
>> hyperlink
>> within the text, and the full text returned to the user with the embedded
>> hyperlinks which he can select if he is interested.
>> 2) Also find and hyperlink "lithium batteries", or "lithium hydride
>> batteries" (with lower ranking) and so on.

View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message