lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominique Béjean <dominique.bej...@eolya.fr>
Subject RE: Problems about using Lucene to generate tag cloud..
Date Tue, 01 Apr 2008 12:30:11 GMT
On www.crossfeeds.com, I use this method in order to update hourly a tag
cloud based on the title of 20.000 RSS articles (RSS published during the
last 24 hours). It takes 1 minute.
 

-----Message d'origine-----
De : wuqi [mailto:chee.wu@gmail.com] 
Envoyé : mardi 1 avril 2008 14:10
À : java-user@lucene.apache.org
Objet : Re: Problems about using Lucene to generate tag cloud..

so build  a index for the dynamically generated docucements set ,and then
try to find frequency for each terms in this index... not sure it's fast
enoug.but it's worth to have a try...
Thank you  Doinique!
----- Original Message ----- 
From: "Dominique Béjean" <dominique.bejean@eolya.fr>
To: <java-user@lucene.apache.org>
Sent: Tuesday, April 01, 2008 3:51 PM
Subject: RE: Problems about using Lucene to generate tag cloud..


May be you can index the set of documents in a temporary index. This index
needs only one field (tag).

Then you can browse the terms collection of the index and get each couple
term/frequency 

        IndexReader reader = IndexReader.open(temp_index);
        TermEnum terms = reader.terms();

        while (terms.next()) {
            String field = terms.term().field();

            if (!"tag".equals(field)) continue;

            String term = terms.term().text();
            int freq = terms.docFreq();
        }

        terms.close();
        reader.close();



-----Message d'origine-----
De : wuqi [mailto:chee.wu@gmail.com] 
Envoyé : lundi 31 mars 2008 09:07
À : java-user@lucene.apache.org
Objet : Problems about using Lucene to generate tag cloud..

Hi,
I am trying to use Lucene index to implement a tag cloud  system. I add a
new field  named "tags" in index to  store all the tags,and we don't support
tags with more than one word, so different tags of the same document just
are separate by white space.  The "tags" filed in one document  may looks
like this :
doc1  tags : travel Beijing  news
doc2  tags:  beijing sports news
I can easily retrieve tags related with single document,and also get the
documents related with certain tag, but it's hard  find a "efficient" way to
get frequent tags  from a "set" of documents of this index.Tthe set of the
documents is always generated dynamically, may be a search result, a
dynamically generated category through clustering. The document set is very
large, maybe several ten thousands or several hundred thousands.So simply
iterate all  the documents in the set and find the frequent tags might not
be applicable.Do you have any better idea ?

Thanks
-Qi


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message