lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: AW: Highlighting, Keywords and Summarizing
Date Tue, 03 May 2005 13:45:33 GMT

On May 3, 2005, at 4:49 AM, Schuh, Stefan wrote:

> Hi,
>
> Thanks for the info.
>
> Keywords are the most important words in articles. Let's say you  
> have an article with 3 or 5 pages, the keywords are the most  
> important words, but no stop words.


Have a look at the Similarity (MoreLikeThis) code in Lucene's  
Subversion repository under contrib/similarity.  It does a very nice  
job of extracting "important" terms and has a fair bit of flexibilty.

     Erik



>
> Regards
>
> Stefan
>
> -----Urspr√ľngliche Nachricht-----
> Von: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Gesendet: Montag, 2. Mai 2005 16:41
> An: general@lucene.apache.org
> Betreff: Re: Highlighting, Keywords and Summarizing
>
>
>
> On May 2, 2005, at 8:25 AM, Schuh, Stefan wrote:
>
>
>> Hi,
>>
>> I'm looking for tools (code) which provides information for:
>>
>> - Highlighting (of search results)
>>
>
> Lucene includes a highlighter in its contrib area.  You can see an
> example of it here: http://www.lucenebook.com/search?query=highlighter
>
> Highlighter is currently in a build-it-yourself state in Lucene's
> Subversion repository, however it will be released in binary official
> form with Lucene 1.9 in the near future.  You can get the binary of it
> from the Lucene in Action source code download.
>
>
>> - extracting of keywords (in different languages)
>>
>
> Please elaborate on what you're after here.
>
>
>> - and summarizing of text (giving a short description of a long text)
>>
>
> Classifier4j has a text summarizer:
> http://classifier4j.sourceforge.net/
>
>     Erik
>


Mime
View raw message