lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alf Eaton <>
Subject Stemmed terms/common terms
Date Thu, 16 Aug 2007 14:17:00 GMT
A couple of questions about term frequencies and stemming:

- What's the best way to get the most common unstemmed form of a  
Porter-stemmed word from the index? For example given the stem  
'walk', find that 'walking' is the most common full word in the index.

- Is there a way to get a list of all the terms in the index (or  
maybe just the top n) ordered by descending frequency of usage? I  
imagine it's related to docFreq, but can't see how to get a list of  
terms in all documents.

I'm using PyLucene and Solr, so if there are easy solutions in either  
of those that would be ideal.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message