Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 50960 invoked from network); 2 May 2007 12:53:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 May 2007 12:53:46 -0000 Received: (qmail 96943 invoked by uid 500); 2 May 2007 12:53:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 96903 invoked by uid 500); 2 May 2007 12:53:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 96892 invoked by uid 99); 2 May 2007 12:53:44 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2007 05:53:44 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of markrmiller@gmail.com designates 66.249.82.231 as permitted sender) Received: from [66.249.82.231] (HELO wx-out-0506.google.com) (66.249.82.231) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2007 05:53:36 -0700 Received: by wx-out-0506.google.com with SMTP id i29so131051wxd for ; Wed, 02 May 2007 05:53:15 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=M+Eznva6qgCXwvEVElW3giQgcz6PlRYzEIu+1xlhB4JSJ2FZ5yx81KwzLBmakXEINbBLlQkVo1vq5OgrIZ39qopuF8AXezp9xw1RcgZr4voCSiXwki9kZjO+IyUPm1Ygr17kAFUKU5cKgHi9R2HS3qjrHDKGygV3TCL99Uopg/M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=MQC1nKx13mMilz7bpfaIlc7PVImdos+YD/f/kW6yG4OYsrwGZfCszuH+Nj179fcNZ0/m4RbtxmxlNTLhRpPS7pZ+OzimJ/btAlPv045oAheepErDGDguWWSAt4S08Pj6gbctIndz1/SLz0DTK+2s97xg/7hHpvHhWGVpLkOVY8w= Received: by 10.90.69.8 with SMTP id r8mr255019aga.1178110393496; Wed, 02 May 2007 05:53:13 -0700 (PDT) Received: from ?192.168.1.100? ( [216.66.114.42]) by mx.google.com with ESMTP id l48sm846115wrl.2007.05.02.05.53.12; Wed, 02 May 2007 05:53:12 -0700 (PDT) Message-ID: <46388994.1000603@gmail.com> Date: Wed, 02 May 2007 08:52:36 -0400 From: Mark Miller User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Keyphrase Extraction References: <363661.83409.qm@web26009.mail.ukl.yahoo.com> In-Reply-To: <363661.83409.qm@web26009.mail.ukl.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org From what I know you generally have to pay if you want something that does this really well. Or check out http://www.nzdl.org/Kea/ Unfortunately, the license is GPL. Really too bad; now that it is all Java, it would make a great combo with Lucene. - Mark mark harwood wrote: > I believe the code Otis is referring to is here: http://issues.apache.org/jira/browse/LUCENE-474 > > This is index-level analysis but could be adapted to work for just a single document. > The implementation is optimised for speed rather than being a thorough examination of phrase significance. > > Cheers > Mark > > ----- Original Message ---- > From: Otis Gospodnetic > To: java-user@lucene.apache.org > Sent: Monday, 30 April, 2007 4:11:36 AM > Subject: Re: Keyphrase Extraction > > Av, look at Lucene's JIRA and search for Mark Harwood. I believe he once contributed something that does this in JIRA. If you are interested in a commercial solution, I can recommend LingPipe. > > Otis > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Lucene Consulting - http://lucene-consulting.com/ > > > ----- Original Message ---- > From: "av_work@yahoo.com" > To: java-user@lucene.apache.org > Sent: Sunday, April 29, 2007 5:24:17 PM > Subject: Keyphrase Extraction > > Hi, > > I tried using MoreLikeThis contrib feature to extract "interesting terms" from a document. This works very well - but only for SINGLE words. > > I am looking for a way to extra "keyPHRASES" from a document. Is there an easy way to achieve this using Lucene index? > > Thanks in advance! > Av > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > ___________________________________________________________ > Yahoo! Answers - Got a question? Someone out there knows the answer. Try it > now. > http://uk.answers.yahoo.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org