mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [slightly off topic] Determining Importance
Date Mon, 03 Jan 2011 18:43:53 GMT

On Jan 3, 2011, at 12:02 PM, Ted Dunning wrote:

> I think that you have identified an interesting cross-cutting category.
> 
> PageRank, HITS and the related algorithms tend to be classified as "link
> analysis"
> Priority inboxes tend to be classified as "classifiers"
> Click-through predictors are often term "recommendation"
> 
> In all these cases, the nomenclature is all about the implementation
> approach, not about the goal.
> 
> Importance modeling as you describe it is all about the goal, not about the
> method.

Yes!  This is what I'm after!  More background on the theory of importance as well as the
implementation side of it.  I figured there has to be some academic work on it, but the terms
are pretty ambiguous, so it's hard to get good results...



> 
> On Mon, Jan 3, 2011 at 8:54 AM, Grant Ingersoll <gsingers@apache.org> wrote:
> 
>> Hi,
>> 
>> I wanted to pick people's brains a little bit on the subject of determining
>> importance.  This isn't necessarily Mahout related, although I think we have
>> some tools that help in the area.
>> 
>> One of the emerging trends it seems these days with all our connectivity
>> and content is a notion of importance/priority.  Some examples:
>> 1. Google now has "Priority Inbox" for instance and I think most would
>> agree that for things like Twitter and Facebook it would be really nice if
>> you could separate out the Important updates/people from the less important.
>> 2. Identifying important phrases, etc. in text across a corpus.
>> 3. One of the things I think most researchers do when exploring a new topic
>> is to identify the one or two seminal papers in the field, read them, and
>> then read the ones that cite those papers and so on.
>> 4. Take in all the day's news and figure out what the key articles are to
>> read (in some sense it's picking the most representative document in a
>> cluster) or that the article talking about raising Federal income taxes is
>> likely more important
>> than the one talking about raising local sales tax (or vice versa!)
>> 5. PageRank, TextRank, etc. and other approaches to calculating authority
>> 
>> What I'm looking for is help in researching this area.  Is there a name for
>> this (sub-)field (importance theory? prioritization theory?), particularly
>> in mach. learning and NLP that is geared towards this?  I realize some
>> (most) of these problems can be solved with classifiers amongst other things
>> like graph algorithms (particularly ones that use the social graph), but it
>> also seems like the area is bigger than a particular implementation, so I
>> wanted to hear what others thought.  How would you go about solving these
>> problems?  Do you have any pointers to useful references on the subject
>> (theoretical or practical)?  What other examples have you run up against?
>> 
>> Thanks,
>> Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com


Mime
View raw message