lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SOMMERIA KLEIN Ariel Ext VIACCESS-BU_DRM" <a.klein....@viaccess.com>
Subject RE: graphically representing an index
Date Fri, 01 Sep 2006 12:46:28 GMT
Hi Andzej,
Thanks for the tip, it does what I want. You are right, though, it's of limited use for helping
the user access data. But I'm sure it will come in handy for my own analysis.
Best,
Ariel

-----Message d'origine-----
De : Andrzej Bialecki [mailto:ab@getopt.org] 
Envoyé : jeudi 31 août 2006 15:49
À : java-user@lucene.apache.org
Objet : Re: graphically representing an index

SOMMERIA KLEIN Ariel Ext VIACCESS-BU_DRM wrote:
> Hi all,
> I'm a newbie with Lucene and I'm looking to implement the following:
> I want to index posts from a forum, and, rather than proposing a search
> on the contents, graphically represent the contents of the index. More
> precisely, I would like to have a list of the most popular words, with a
> number next to each indicating how often they occur. 
> The icing on the cake would be to be able to click on such a word and
> get a subset of the posts including that word. 
> Can Lucene be used for this? Has anyone already implemented it? Any
> links?
> I've dug around a bit without any success, but my apologies if this has
> already been dealt with
>
>   

See http://www.getopt.org/luke for an example of such functionality. 
However, I must disappoint you - the most frequent words in a corpus are 
quite probably also most useless words. For English these are: the, a, 
to, for, by, in, can, I, ...
 So, you will need to eliminate them from the top of the list to get any 
useful results.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



-----------------------------------------

"Privileged/Confidential information may be contained in this e-mail 
and attachments. This e-mail, including attachments, constitutes non-public information intended
to be conveyed only to the designated recipient(s). If you are not an intended recipient,
please delete this e-mail, including attachments, and notify us immediately. The unauthorized
use, dissemination, distribution or reproduction of this e-mail, including attachments, is
prohibited and may be unlawful. In general, the content of this e-mail and attachments does
not constitute any form of commitment by VIACCESS SA."

-----------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message