Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 29876 invoked from network); 13 Aug 2009 13:28:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Aug 2009 13:28:45 -0000 Received: (qmail 77104 invoked by uid 500); 13 Aug 2009 13:28:50 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 77015 invoked by uid 500); 13 Aug 2009 13:28:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 77005 invoked by uid 99); 13 Aug 2009 13:28:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Aug 2009 13:28:50 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.83] (HELO spunkymail-a12.g.dreamhost.com) (208.97.132.83) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Aug 2009 13:28:38 +0000 Received: from [10.0.0.77] (adsl-065-013-152-164.sip.rdu.bellsouth.net [65.13.152.164]) by spunkymail-a12.g.dreamhost.com (Postfix) with ESMTP id D61D47FA8 for ; Thu, 13 Aug 2009 06:28:16 -0700 (PDT) Message-Id: <2601AB6F-2B4F-4257-8DD4-CB2CACA1ACC4@apache.org> From: Grant Ingersoll To: java-user@lucene.apache.org In-Reply-To: <24954024.post@talk.nabble.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: Term Extraction Date: Thu, 13 Aug 2009 09:28:15 -0400 References: <24953406.post@talk.nabble.com> <52395A4A-D1A5-4370-8834-1F9014D02A0F@apache.org> <24954024.post@talk.nabble.com> X-Mailer: Apple Mail (2.936) X-Virus-Checked: Checked by ClamAV on apache.org I would just throw your doc into a MemoryIndex (lives in contrib/ memory, I think; it only holds one doc), get the Vector and do what you need to do. So you would kind of be doing indexing, but not really. On Aug 13, 2009, at 8:43 AM, joe_coder wrote: > > Grant, thanks for responding. > > My issue is that I am not planning to use lucene ( as I don't need any > search capability, atleast yet). All I have is a text document and I > need to > extract keywords and their frequency ( which could be a simple split > on > space and tracking the count). But I realize that I would need to do > some > preprocessing to remove stopwords, stem words and also check for > synonyms. > So wondering if there is already such code present in lucene ( or > any other > project ) that I can use directly. > > Thanks! > > > > Grant Ingersoll-6 wrote: >> >> >> On Aug 13, 2009, at 7:40 AM, joe_coder wrote: >> >>> >>> I was wondering if there is any way to directly use Lucene API to >>> extract >>> terms from a given string. My requirement is that I have a text >>> document for >>> which I need a term frequency vector ( after stemming, removing >>> stopwords >>> and synonyms checks ). The result needs to be the terms and >>> frequency. >> >> IndexReader.getTermFreqVector(), assuming you have indexed using Term >> Vectors. >> >> >>> >>> Is it possible to get this using any lucene API? ( As I see lucene >>> also >>> needs to stem, remove stopwords, synonyms etc before indexing). Or >>> is this >>> any java project that would help me in this? >>> -- >>> View this message in context: >>> http://www.nabble.com/Term-Extraction-tp24953406p24953406.html >>> Sent from the Lucene - Java Users mailing list archive at >>> Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org