lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vaijanath N. Rao" <vaiju1...@gmail.com>
Subject Re: TermVectorComponent for tag generation?
Date Sat, 01 Nov 2008 05:48:07 GMT
Hi Jon,

Isn't it similar to what Grant just said the top most terms ( after 
removing the stop words ).

You would need to get how many terms are there and there related 
frequency and any term which is beyond a certain threshold you would 
mark it as an member of tag set.

One can also build a set of related entities or terms which are 
following the current term, and than can decide on which all can become 
part of the tagset.

It that the requirement or I am missing something here.

-- Thanks and Regards
Vaijanath N. Rao

Jon Baer wrote:
> Well for example in any given text (which is field on a document);
>
> "While suitable for any application which requires full text indexing 
> and searching capability, Lucene has been widely recognized for its 
> utility in the implementation of Internet search engines and local, 
> single-site searching.
>
> At the core of Lucene's logical architecture is the idea of a document 
> containing fields of text. This flexibility allows Lucene's API to be 
> independent of file format. Text from PDFs, HTML, Microsoft Word 
> documents, as well as many others can all be indexed so long as their 
> textual information can be extracted."
>
> Id like to be able to say the tags for this article should be [Lucene, 
> PDF, HTML, Microsoft Word] because they are in field values from other 
> documents.  Basically how to generate tags from just a single document 
> based on other document field values.
>
> - Jon
>
>
> On Oct 31, 2008, at 6:17 PM, Grant Ingersoll wrote:
>
>> Hey Jon,
>>
>> Not following how the TVC (TermVectorComp) would help here.    I 
>> suppose you could use the "most important" terms, as defined by 
>> TF-IDF, as suggested tags.  The MLT (MoreLikeThis) uses this to 
>> generate query terms.
>>
>> However, I'm not following the different filter query piece.  Can you 
>> provide a bit more details?
>>
>> One thing you did make me think, though, is it might be interesting 
>> to extend TermVectorMapper so that it can output a NamedList and then 
>> allow people to implement their own SolrTermVectorMapper and have it 
>> customize the TV output...
>>
>> Thanks,
>> Grant
>>
>> On Oct 31, 2008, at 5:20 PM, Jon Baer wrote:
>>
>>> Hi,
>>>
>>> So Im looking to either use this or build a component which might do 
>>> what Im looking for.  Id like to figure out if its possible use a 
>>> single doc to get tag generation based on the matches within that 
>>> document for example:
>>>
>>> 1 News Doc -> contains 5 Players and 8 Teams (show them as possible 
>>> tags for this article)
>>>
>>> In this case Players and Teams are also docs.  It's almost like I 
>>> want to use MoreLikeThis w/ a different filter query than what Im 
>>> using.
>>>
>>> Is there any easy hack to get this going?
>>>
>>> Thanks.
>>>
>>> - Jon
>>
>> --------------------------
>> Grant Ingersoll
>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>> http://www.lucenebootcamp.com
>>
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>


Mime
View raw message