lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From apgw <anth...@databaserepublic.com>
Subject Re: Lucene rich-text search with returned hyperlinks
Date Fri, 22 May 2009 19:23:41 GMT

I found the highlighter (in the contrib dir). Thanks for your help.


apgw wrote:
> 
> I will try Lucene rather than hand-coding in order to keep the features
> like stemming, stop removal, fuzzy query, some european language support
> and so on. It may also be a starting point for using it more later on (I
> suspect this will happen if the initial try-out goes well).
> 
> I was expecting the Highlight package (org.apache.lucene.search.highlight)
> to be included in the jar file, but don't seem to see it (this is the
> 2.4.1 current version). Is this in a separate download somewhere? It
> includes classes Highlighter, QueryScorer (all listed at
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/highlight/package-summary.html)
> 
> 
> 
> Ted Dunning wrote:
>> 
>> You could turn this inside out and get the result you want, I think.
>> 
>> If you index each document with separate Lucene fields for each document,
>> then you can start with a search for all documents that have the text you
>> want to find in the fields you care about.  Then, all of the documents
>> that
>> you have will have the text you want.
>> 
>> Alternately, if you have one or more documents and want to find out
>> whether
>> they have matches against particular fields, you can combine a search for
>> the strings you want in the fields you desire with a filter that limits
>> the
>> search to the documents in question.
>> 
>> Mostly, however, it sounds like what you need is a bit different from
>> what
>> Lucene is intended to provide which is the ability to search a gazillion
>> documents for text relevant to a pretty fuzzy query.  With only one
>> document
>> and only a few fields to search, you might be just as well off coding the
>> search explicitly.  Lucene could still serve as a nice substrate for
>> document storage.
>> 
>> On Mon, May 11, 2009 at 9:49 AM, apgw <anthony@databaserepublic.com>
>> wrote:
>> 
>>>
>>> The documents are text fields in a db of legal docs, so the search is
>>> not
>>> for
>>> the document but for the search string(s) (there will be multiple)
>>> within a
>>> given document. The search strings are manually derived from the main
>>> part,
>>> and they would like to match these in the law's various sub-sections
>>> automatically (legal docs - long, tedious...).
>>>
>>> I have more detailed questions (?should they be indexed when saved, or
>>> if
>>> this can be done quickly enough when the page is requested), and so on,
>>> but
>>> this is probably not the right forum; just need to know if Lucene will
>>> do
>>> it. I see there is a new edition of the 'Lucene in Action' almost ready;
>>> I
>>> have the first ed coming in the mail which I hope will help.
>>>
>>>
>>> Ted Dunning wrote:
>>> >
>>> > Yes.  This can be done using Lucene.
>>> >
>>> > But, this is subject to a few liberal interpretations of what you
>>> asked
>>> > for.  To wit, I am assuming that you want to find interesting
>>> documents
>>> > from
>>> > a bunch of documents, not just search a single document for matches.
>>> >
>>> > The span queries that another poster mentioned would be good as would
>>> > sloppy
>>> > phrase queries.
>>> >
>>> > Depending on which European languages you need to handle, there may be
>>> > some
>>> > work you need to do to deal with morphological analysis.  Lucene has
>>> > reasonable support for English and somewhat more rudimentary support
>>> for
>>> a
>>> > few other European languages.  Support for Asian languages is very
>>> basic
>>> > at
>>> > best.
>>> >
>>> > On Sun, May 10, 2009 at 7:43 PM, apgw <anthony@databaserepublic.com>
>>> > wrote:
>>> >
>>> >>
>>> >> I am new to Lucene. Is this the right utility to use for the
>>> following
>>> >> use
>>> >> case:
>>> >>
>>> >> 1) Find a search term - eg. 'lithium battery' in some technical
>>> rich-text
>>> >> data (can be in any european language), 4K - 64K size, and return the
>>> >> exact
>>> >> position in the text so that the occurrence can be turned into a
>>> >> hyperlink
>>> >> within the text, and the full text returned to the user with the
>>> embedded
>>> >> hyperlinks which he can select if he is interested.
>>> >>
>>> >> 2) Also find and hyperlink "lithium batteries", or "lithium hydride
>>> >> batteries" (with lower ranking) and so on.
>>> >>
>>> >>
>>> >
>>> >
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Lucene-rich-text-search-with-returned-hyperlinks-tp23476377p23487090.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
>> -- 
>> Ted Dunning, CTO
>> DeepDyve
>> 
>> 111 West Evelyn Ave. Ste. 202
>> Sunnyvale, CA 94086
>> www.deepdyve.com
>> 858-414-0013 (m)
>> 408-773-0220 (fax)
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Lucene-rich-text-search-with-returned-hyperlinks-tp23476377p23676514.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message