mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Collins <patrick.coll...@ready2sign.com>
Subject Fuzzy logic and Heuristics vs Classification
Date Tue, 28 Jun 2011 00:51:13 GMT
Has anyone got any advice on how to combine heuristics and classification?

When preparing my data to build out the features to feed into my 
classification model I keep noticing patterns of text which I know with 
99.99% probability implies a certain outcome.

How would you construct the data/features in order to pre-classify this 
data to provide much more likelihood that the classifier comes to the 
"correct" conclusion?

For example, I remember seeing an anti-spam machine which used a 
combination of fuzzy logic and then classification to build a better 
outcome (but he did not detail out how it was actually implemented). He 
used a whole range of heuristics to determine that a certain sender is 
known to be a spammer rather than just blindly passing this data in to 
the classifier.

In my dataset I have a LOT of patterns like this that I can identify and 
then determine with very high probability the outcome. I say high 
probability, but I cannot say absolutely. Ideally if I could pre compute 
a lot of this data using heuristics I could feed this information in to 
the classifier to greatly reduce the number of features. But the 
classifiers do not allow me the ability to provide a "weight" to a 
certain feature.

Other than "well just try and see what works", I was wondering how do 
people deal with this problem? Do they just leave it to the classifier 
and hope that the classifier picks up the same patterns?

I'm a bit new to mahout and classification algorithms and so am just 
trying to get some input from how others might see this problem and 
whether I'm barking up the wrong tree.

Patrick.

Mime
View raw message