mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: newbie intro
Date Fri, 25 Sep 2009 15:56:54 GMT
This may help in the margins, but it is surprising how good simpler methods
work.

tf-idf is, btw, an approximation of the LLR score.  There some interesting
edge conditions where the approximation breaks, notably when there are
several occurrences in the text of interest.

On Fri, Sep 25, 2009 at 5:37 AM, Isabel Drost <isabel@apache.org> wrote:

> So I think, POS tags and TFIDF should be features determining whether
> a phrase should be considered as key phrase or not - maybe even key
> indicators to generate a key phrase candidate set. But there may be many
> more features. Lastly it might be easier to come up with a
> training set of good and bad phrases (plus their feature vectors) and
> let a classifier do the selection compared to manually hand coding the
> rules and feature weights for phrase selection.
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message