lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: graphically representing an index
Date Thu, 31 Aug 2006 13:49:19 GMT
> Hi all,
> I'm a newbie with Lucene and I'm looking to implement the following:
> I want to index posts from a forum, and, rather than proposing a search
> on the contents, graphically represent the contents of the index. More
> precisely, I would like to have a list of the most popular words, with a
> number next to each indicating how often they occur. 
> The icing on the cake would be to be able to click on such a word and
> get a subset of the posts including that word. 
> Can Lucene be used for this? Has anyone already implemented it? Any
> links?
> I've dug around a bit without any success, but my apologies if this has
> already been dealt with

See for an example of such functionality. 
However, I must disappoint you - the most frequent words in a corpus are 
quite probably also most useless words. For English these are: the, a, 
to, for, by, in, can, I, ...
 So, you will need to eliminate them from the top of the list to get any 
useful results.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message