lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Bank <mathias.b...@gmail.com>
Subject Re: Creating tag clouds with lucene
Date Fri, 06 Nov 2009 08:25:52 GMT
Well, it could be a facet search, if there would be tags available but
if you just wanna have a "tag cloud" generated by full-text, I don't
see how a facet search could help to generate this cloud.
Unfortunatelly, I don't have tags in my data. What I need is the
information, what are the most used terms (or multi terms) in this
data. First I have thought of using carrot2, which uses a specialed
clustering algorithm. But I have wondered, if it is not possible to
get the most used terms out of lucene directly.

Glen has mentioned, that he is doing this for full-text data. He
mentioned that he is using the IndexReader.termDocs(Term term) method.
So I think he iterates all terms and looks in how many documents this
term exists. But what I don't see is: how does this method work with a
filter? Do you first look for all documents which are valid for the
used filter and than iterate all terms only counting documents in this
filtered set? I cannot imagine, that this is performant because I have
more than 10 mio documents (fast growing).

Mathias

2009/11/6 Chris Lu <chris.lu@gmail.com>:
> Isn't the tag cloud just another facet search? Only difference is the tag is
> multi-valued.
>
> Basically just go through the search results and find all unique tag values.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
> Mathias Bank wrote:
>>
>> Hi,
>>
>> I want to calculate a tag cload for search results. I have seen, that
>> it is possible to extract the top 20 words out of the lucene index. Is
>> there also a possibility to extract the top 20 words out of search
>> results (or filter results) in lucene?
>>
>> Mathias
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message