lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Lucene rich-text search with returned hyperlinks
Date Mon, 11 May 2009 16:58:51 GMT
You could turn this inside out and get the result you want, I think.

If you index each document with separate Lucene fields for each document,
then you can start with a search for all documents that have the text you
want to find in the fields you care about.  Then, all of the documents that
you have will have the text you want.

Alternately, if you have one or more documents and want to find out whether
they have matches against particular fields, you can combine a search for
the strings you want in the fields you desire with a filter that limits the
search to the documents in question.

Mostly, however, it sounds like what you need is a bit different from what
Lucene is intended to provide which is the ability to search a gazillion
documents for text relevant to a pretty fuzzy query.  With only one document
and only a few fields to search, you might be just as well off coding the
search explicitly.  Lucene could still serve as a nice substrate for
document storage.

On Mon, May 11, 2009 at 9:49 AM, apgw <anthony@databaserepublic.com> wrote:

>
> The documents are text fields in a db of legal docs, so the search is not
> for
> the document but for the search string(s) (there will be multiple) within a
> given document. The search strings are manually derived from the main part,
> and they would like to match these in the law's various sub-sections
> automatically (legal docs - long, tedious...).
>
> I have more detailed questions (?should they be indexed when saved, or if
> this can be done quickly enough when the page is requested), and so on, but
> this is probably not the right forum; just need to know if Lucene will do
> it. I see there is a new edition of the 'Lucene in Action' almost ready; I
> have the first ed coming in the mail which I hope will help.
>
>
> Ted Dunning wrote:
> >
> > Yes.  This can be done using Lucene.
> >
> > But, this is subject to a few liberal interpretations of what you asked
> > for.  To wit, I am assuming that you want to find interesting documents
> > from
> > a bunch of documents, not just search a single document for matches.
> >
> > The span queries that another poster mentioned would be good as would
> > sloppy
> > phrase queries.
> >
> > Depending on which European languages you need to handle, there may be
> > some
> > work you need to do to deal with morphological analysis.  Lucene has
> > reasonable support for English and somewhat more rudimentary support for
> a
> > few other European languages.  Support for Asian languages is very basic
> > at
> > best.
> >
> > On Sun, May 10, 2009 at 7:43 PM, apgw <anthony@databaserepublic.com>
> > wrote:
> >
> >>
> >> I am new to Lucene. Is this the right utility to use for the following
> >> use
> >> case:
> >>
> >> 1) Find a search term - eg. 'lithium battery' in some technical
> rich-text
> >> data (can be in any european language), 4K - 64K size, and return the
> >> exact
> >> position in the text so that the occurrence can be turned into a
> >> hyperlink
> >> within the text, and the full text returned to the user with the
> embedded
> >> hyperlinks which he can select if he is interested.
> >>
> >> 2) Also find and hyperlink "lithium batteries", or "lithium hydride
> >> batteries" (with lower ranking) and so on.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Lucene-rich-text-search-with-returned-hyperlinks-tp23476377p23487090.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>


-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message