lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N. Hira" <nh...@cognocys.com>
Subject Re: Link map over results? or term freq
Date Thu, 16 Oct 2008 21:17:28 GMT
I think I understand what you're describing as a "link map" to be a  
"tag cloud" where each tag is a "frequent" or "strong" term.

We did something like this as an experiment (without Lucene):
http://www.cognocys.com/prospector/news.html

If you're talking about something similar, then I think you can use  
Lucene's TFVs only to get at the frequency data in the context of the  
Documents (not the results).  I'm no expert, but I say this because  
I've only ever seen TermFrequencyVectors being discussed in the  
context of an IndexReader, not in the context of Hits or TopDocs.
http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/ 
class-use/TermFreqVector.html

The other thing, though, is that TF may not be sufficient to  
determine what to use for each tag/link.  For example, given a set of  
Results, R, would you like to use:
1.  the top N most frequent terms for each Document in R?
2.  the top M most frequent terms that are common to all/many  
Documents in R?
3.  the top O most frequent terms that are common in results built  
using the highlighter?
...

To a certain extent, this is a clustering problem:-- given some set  
of Documents, R, which just happen to be the results of some search,  
represent R using a tag cloud/link map of terms that best represent R.

Have you looked at carrot2?  I haven't seen the tag cloud  
visualization there, but you may find some ideas for clustering/ 
document-set representation there:
http://project.carrot2.org/


Good luck!

-h

On 16-Oct-2008, at 3:21 PM, Darren Govoni wrote:

> I guess a link map (as I understand it) is a collection of  
> hyperlinks of
> words/phrases where the dominant ones are bolder color and larger  
> font.
> Its relatively new schema, some sites are using.
>
> For example, someone searches for a person and a link map would show
> them all the most frequent terms in the results they got back. Sort of
> like latent relationships.
>
> Does that help?
>
> I thought this could be done using term frequency vectors in  
> Lucene, but
> I've never used TFV's before. And can then be limited to just a set of
> results.
>
> HTH,
> Darren
>
> On Thu, 2008-10-16 at 14:09 -0400, Glen Newton wrote:
>> Sorry, could you explain what you mean by a "link map over lucene  
>> results"?
>>
>> thanks,
>> -glen
>>
>> 2008/10/16 Darren Govoni <darren@ontrenet.com>:
>>> Hi,
>>>  Has anyone created a link map over lucene results or know of a link
>>> describing the process? If not, I would like to build one to  
>>> contribute.
>>>
>>> Also, I read about term frequencies in the book, but wanted to  
>>> know if I
>>> can extract the strongest occurring terms from a given result set or
>>> result?
>>>
>>> thank you for any help. I will keep reading/looking.
>>>
>>> Darren
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message