lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From apgw <anth...@databaserepublic.com>
Subject Re: Lucene rich-text search with returned hyperlinks
Date Fri, 22 May 2009 19:04:37 GMT

I will try Lucene rather than hand-coding in order to keep the features like
stemming, stop removal, fuzzy query, some european language support and so
on. It may also be a starting point for using it more later on (I suspect
this will happen if the initial try-out goes well).

I was expecting the Highlight package (org.apache.lucene.search.highlight)
to be included in the jar file, but don't seem to see it (this is the 2.4.1
current version). Is this in a separate download somewhere? It includes
classes Highlighter, QueryScorer (all listed at
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/highlight/package-summary.html)



Ted Dunning wrote:
> 
> You could turn this inside out and get the result you want, I think.
> 
> If you index each document with separate Lucene fields for each document,
> then you can start with a search for all documents that have the text you
> want to find in the fields you care about.  Then, all of the documents
> that
> you have will have the text you want.
> 
> Alternately, if you have one or more documents and want to find out
> whether
> they have matches against particular fields, you can combine a search for
> the strings you want in the fields you desire with a filter that limits
> the
> search to the documents in question.
> 
> Mostly, however, it sounds like what you need is a bit different from what
> Lucene is intended to provide which is the ability to search a gazillion
> documents for text relevant to a pretty fuzzy query.  With only one
> document
> and only a few fields to search, you might be just as well off coding the
> search explicitly.  Lucene could still serve as a nice substrate for
> document storage.
> 
> On Mon, May 11, 2009 at 9:49 AM, apgw <anthony@databaserepublic.com>
> wrote:
> 
>>
>> The documents are text fields in a db of legal docs, so the search is not
>> for
>> the document but for the search string(s) (there will be multiple) within
>> a
>> given document. The search strings are manually derived from the main
>> part,
>> and they would like to match these in the law's various sub-sections
>> automatically (legal docs - long, tedious...).
>>
>> I have more detailed questions (?should they be indexed when saved, or if
>> this can be done quickly enough when the page is requested), and so on,
>> but
>> this is probably not the right forum; just need to know if Lucene will do
>> it. I see there is a new edition of the 'Lucene in Action' almost ready;
>> I
>> have the first ed coming in the mail which I hope will help.
>>
>>
>> Ted Dunning wrote:
>> >
>> > Yes.  This can be done using Lucene.
>> >
>> > But, this is subject to a few liberal interpretations of what you asked
>> > for.  To wit, I am assuming that you want to find interesting documents
>> > from
>> > a bunch of documents, not just search a single document for matches.
>> >
>> > The span queries that another poster mentioned would be good as would
>> > sloppy
>> > phrase queries.
>> >
>> > Depending on which European languages you need to handle, there may be
>> > some
>> > work you need to do to deal with morphological analysis.  Lucene has
>> > reasonable support for English and somewhat more rudimentary support
>> for
>> a
>> > few other European languages.  Support for Asian languages is very
>> basic
>> > at
>> > best.
>> >
>> > On Sun, May 10, 2009 at 7:43 PM, apgw <anthony@databaserepublic.com>
>> > wrote:
>> >
>> >>
>> >> I am new to Lucene. Is this the right utility to use for the following
>> >> use
>> >> case:
>> >>
>> >> 1) Find a search term - eg. 'lithium battery' in some technical
>> rich-text
>> >> data (can be in any european language), 4K - 64K size, and return the
>> >> exact
>> >> position in the text so that the occurrence can be turned into a
>> >> hyperlink
>> >> within the text, and the full text returned to the user with the
>> embedded
>> >> hyperlinks which he can select if he is interested.
>> >>
>> >> 2) Also find and hyperlink "lithium batteries", or "lithium hydride
>> >> batteries" (with lower ranking) and so on.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene-rich-text-search-with-returned-hyperlinks-tp23476377p23487090.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve
> 
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> www.deepdyve.com
> 858-414-0013 (m)
> 408-773-0220 (fax)
> 
> 

-- 
View this message in context: http://www.nabble.com/Lucene-rich-text-search-with-returned-hyperlinks-tp23476377p23676291.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message