lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Newton <glen.new...@gmail.com>
Subject Re: Creating tag clouds with lucene
Date Thu, 05 Nov 2009 16:03:47 GMT
Mathias,

This is a special case: I am doing this on fields that have single
term entries, or are treated like a single term (like 'author') and
not parsed.
Here is what I am doing:
//Loop through the top N Hits (this application is Lucene 2.2, so it
is still using Hits).
Foreach Document
{
  add the term in the Field to a HashMap<String,Integer>, <term,
count>, incrementing existing entries;
}
Sort the hash by count, taking the top M
Sort top M by term
Display tag cloud.
----------

If you were using full-text, I think you would instead would use the
 TermDocs IndexReader.termDocs(Term term)
and take the top N terms & add to the hash.

An issue is likely whether you want to have multiple word phrases in
your tag cloud, like in the above example "Cell adhesion molecules".
You would have to play with your Analyzer to get things like this.
N-grams for terms would handle this? Anyone?

thanks,

Glen



2009/11/5 Mathias Bank <mathias.bank@gmail.com>:
> Hi Glen,
>
> great, that is exactly what I'm looking for. How are you doing this?
>
> Mathias
>
> 2009/11/5 Glen Newton <glen.newton@gmail.com>:
>> Yes. I do it here in Ungava, on a search of "cancer" and "cell" in title:
>>  http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava01/Search?tagCloud=true&collection=csu&tagField=keyword&title=cell&title=cancer&numCloudDocs=200&numCloudTags=50
>>
>> and here on full-text:
>>  http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?tagCloud=true&collection=jos&tagField=keyword&contents=cell&contents=cancer&numCloudDocs=200&numCloudTags=50
>>
>> -glen
>>
>> 2009/11/5 Mathias Bank <mathias.bank@gmail.com>:
>>> Hi,
>>>
>>> I want to calculate a tag cload for search results. I have seen, that
>>> it is possible to extract the top 20 words out of the lucene index. Is
>>> there also a possibility to extract the top 20 words out of search
>>> results (or filter results) in lucene?
>>>
>>> Mathias
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>>
>> --
>>
>> -
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message