mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Scholten <fr...@frankscholten.nl>
Subject SGD classifier demo app
Date Mon, 03 Feb 2014 20:33:42 GMT
Hi all,

I am exploring Mahout's SGD classifier and like some feedback because I
think I didn't properly configure things.

I created an example app that trains an SGD classifier on the 'bank
marketing' dataset from UCI:
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

My app is at: https://github.com/frankscholten/mahout-sgd-bank-marketing

The app reads a CSV file of telephone calls, encodes the features into a
vector and tries to predict whether a customer answers yes to a business
proposal.

I do a few runs and measure accuracy but I'm I don't trust the results.
When I only use an intercept term as a feature I get around 88% accuracy
and when I add all features it drops to around 85%. Is this perhaps because
the dataset highly unbalanced? Most customers answer no. Or is the
classifier biased to predict 0 as the target code when it doesn't have any
data to go with?

Any other comments about my code or improvements I can make in the app are
welcome! :)

Cheers,

Frank

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message