mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Fuzzy logic and Heuristics vs Classification
Date Tue, 28 Jun 2011 04:55:01 GMT
Yeah... what Hector says.

You can even make the output of preliminary classifiers be features for new
classifiers.

Or if you have two different target variables, you can make a model that
predicts one target be a feature in the model that predicts the other.

Feature extraction generally has more potential for performance improvements
than any algorithm changes.

On Mon, Jun 27, 2011 at 7:22 PM, Hector Yee <hector.yee@gmail.com> wrote:

> Redacted to pass the overly aggressive spam filter.
>
> On Mon, Jun 27, 2011 at 7:19 PM, Hector Yee <hector.yee@gmail.com> wrote:
>
> > Just make the pattern a feature and feed it into the machine learning.
> >
> > e.g. if its a spam model and you notice v**gra  is a spam term just make
> > feature 0 = "v**gra count" and the rest your regular bag of words.
> >
> > The only thing you have to be careful of is the relative weights between
> > each feature category. Typical normalizations is to L2 norm each feature
> > category separately before concatenation.
> > Another option is to use a "scale free" classification algorithm like
> > adaboost.
> >
> >
> > On Mon, Jun 27, 2011 at 5:51 PM, Patrick Collins <
> > patrick.collins@ready2sign.com> wrote:
> >
> >> Has anyone got any advice on how to combine heuristics and
> classification?
> >>
> >> When preparing my data to build out the features to feed into my
> >> classification model I keep noticing patterns of text which I know with
> >> 99.99% probability implies a certain outcome.
> >>
> >> How would you construct the data/features in order to pre-classify this
> >> data to provide much more likelihood that the classifier comes to the
> >> "correct" conclusion?
> >>
> >> For example, I remember seeing an anti-spam machine which used a
> >> combination of fuzzy logic and then classification to build a better
> outcome
> >> (but he did not detail out how it was actually implemented). He used a
> whole
> >> range of heuristics to determine that a certain sender is known to be a
> >> spammer rather than just blindly passing this data in to the classifier.
> >>
> >> In my dataset I have a LOT of patterns like this that I can identify and
> >> then determine with very high probability the outcome. I say high
> >> probability, but I cannot say absolutely. Ideally if I could pre compute
> a
> >> lot of this data using heuristics I could feed this information in to
> the
> >> classifier to greatly reduce the number of features. But the classifiers
> do
> >> not allow me the ability to provide a "weight" to a certain feature.
> >>
> >> Other than "well just try and see what works", I was wondering how do
> >> people deal with this problem? Do they just leave it to the classifier
> and
> >> hope that the classifier picks up the same patterns?
> >>
> >> I'm a bit new to mahout and classification algorithms and so am just
> >> trying to get some input from how others might see this problem and
> whether
> >> I'm barking up the wrong tree.
> >>
> >> Patrick.
> >>
> >
> >
> >
> > --
> > Yee Yang Li Hector
> > http://hectorgon.blogspot.com/ (tech + travel)
> > http://hectorgon.com (book reviews)
> >
> >
>
>
> --
> Yee Yang Li Hector
> http://hectorgon.blogspot.com/ (tech + travel)
> http://hectorgon.com (book reviews)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message