lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Term Extraction
Date Thu, 13 Aug 2009 13:28:15 GMT

I would just throw your doc into a MemoryIndex (lives in contrib/ 
memory, I think; it only holds one doc), get the Vector and do what  
you need to do.  So you would kind of be doing indexing, but not really.


On Aug 13, 2009, at 8:43 AM, joe_coder wrote:

>
> Grant, thanks for responding.
>
> My issue is that I am not planning to use lucene ( as I don't need any
> search capability, atleast yet). All I have is a text document and I  
> need to
> extract keywords and their frequency ( which could be a simple split  
> on
> space and tracking the count). But I realize that I would need to do  
> some
> preprocessing to remove stopwords, stem words and also check for  
> synonyms.
> So wondering if there is already such code present in lucene ( or  
> any other
> project ) that I can use directly.
>
> Thanks!
>
>
>
> Grant Ingersoll-6 wrote:
>>
>>
>> On Aug 13, 2009, at 7:40 AM, joe_coder wrote:
>>
>>>
>>> I was wondering if there is any way to directly use Lucene API to
>>> extract
>>> terms from a given string. My requirement is that I have a text
>>> document for
>>> which I need a term frequency vector ( after stemming, removing
>>> stopwords
>>> and synonyms checks ). The result needs to be the terms and  
>>> frequency.
>>
>> IndexReader.getTermFreqVector(), assuming you have indexed using Term
>> Vectors.
>>
>>
>>>
>>> Is it possible to get this using any lucene API? ( As I see lucene
>>> also
>>> needs to stem, remove stopwords, synonyms etc before indexing). Or
>>> is this
>>> any java project that would help me in this?
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Term-Extraction-tp24953406p24953406.html
>>> Sent from the Lucene - Java Users mailing list archive at  
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message