lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <chris...@gmail.com>
Subject Re: Creating tag clouds with lucene
Date Fri, 06 Nov 2009 03:44:48 GMT
Interesting idea! Kind of "cheating" because the word frequency in the 
whole index is simply mapped to the search results, which is arguable.

But maybe in practice it could work just fine, since nobody really cares 
about the counts anyway.
When users click the tag cloud, did anyone really have cared about the 
frequency in the search results?

DBSight uses the multi-valued facet search approach to do tag cloud. 
Maybe I can "cheat" it this way also... It does save some memory.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro
funding!


Jake Mannix wrote:
> Well you can do it as a facet search, but in addition to doing multi-valued
> faceting, you can also normalize the counts by dividing by the docFreq of
> the term, which instead of getting you the most popular tags which overlap
> your query, you get the tags which are more popular for documents matching
> your query relative to how popular those tags are in general, which is a way
> of getting the tags "related to" your query.  This can be done pretty easily
> within bobo (I just whipped up the code to do this while eating dinner just
> now, in fact, let me know if you want to try out that way of doing it, and I
> can walk you through the bobo code you'd need to write for this), and it's
> probably not too hard to do in Solr either.
>
> How big your index is (and how many tags per document there are, and how
> many unique tags there are) will have a big impact on how performant this
> query is, of course, but in my experience as long as this is a typical tag
> case (with only a handful of values per document), this can be done not much
> slower than your original query.
>
>   -jake
>
> On Thu, Nov 5, 2009 at 7:01 PM, Chris Lu <chris.lu@gmail.com> wrote:
>
>   
>> Isn't the tag cloud just another facet search? Only difference is the tag
>> is multi-valued.
>>
>> Basically just go through the search results and find all unique tag
>> values.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>>
>> Mathias Bank wrote:
>>
>>     
>>> Hi,
>>>
>>> I want to calculate a tag cload for search results. I have seen, that
>>> it is possible to extract the top 20 words out of the lucene index. Is
>>> there also a possibility to extract the top 20 words out of search
>>> results (or filter results) in lucene?
>>>
>>> Mathias
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message