lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: Keyphrase Extraction
Date Sun, 06 May 2007 12:33:06 GMT

You could also try splitting the document into paragraphs and use Carrot2's 
Lingo algorithm (www.carrot2.org) on a paragraph-level to extract clusters. 
Labelling routine in Lingo should extract 'key' phrases; this analysis is 
heavily frequency-based, but... you know, you may want to try it.

Dawid

Mark Miller wrote:
>  From what I know you generally have to pay if you want something that 
> does this really well. Or check out http://www.nzdl.org/Kea/
> Unfortunately, the license is GPL. Really too bad; now that it is all 
> Java, it would make a great combo with Lucene.
> 
> - Mark
> 
> mark harwood wrote:
>> I believe the code Otis is referring to is here: 
>> http://issues.apache.org/jira/browse/LUCENE-474
>>
>> This is index-level analysis but could be adapted to work for just a 
>> single document.
>> The implementation is optimised for speed rather than being a thorough 
>> examination of phrase significance.
>> Cheers
>> Mark
>>
>> ----- Original Message ----
>> From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
>> To: java-user@lucene.apache.org
>> Sent: Monday, 30 April, 2007 4:11:36 AM
>> Subject: Re: Keyphrase Extraction
>>
>> Av, look at Lucene's JIRA and search for Mark Harwood.  I believe he 
>> once contributed something that does this in JIRA.  If you are 
>> interested in a commercial solution, I can recommend LingPipe.
>>
>> Otis
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> Lucene Consulting - http://lucene-consulting.com/
>>
>>
>> ----- Original Message ----
>> From: "av_work@yahoo.com" <av_work@yahoo.com>
>> To: java-user@lucene.apache.org
>> Sent: Sunday, April 29, 2007 5:24:17 PM
>> Subject: Keyphrase Extraction
>>
>> Hi,
>>
>> I tried using MoreLikeThis contrib feature to extract "interesting 
>> terms" from a document. This works very well - but only for SINGLE words.
>>
>> I am looking for a way to extra "keyPHRASES" from a document. Is there 
>> an easy way to achieve this using Lucene index?
>>
>> Thanks in advance!
>> Av
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around 
>> http://mail.yahoo.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>>
>>       ___________________________________________________________
>> Yahoo! Answers - Got a question? Someone out there knows the answer. 
>> Try it
>> now.
>> http://uk.answers.yahoo.com/
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message