mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan <joan.monp...@gmail.com>
Subject Problem with Custom Classifier
Date Tue, 10 Jul 2012 18:35:56 GMT
Hi,

I'm testing the example 20 news groups, but I've a little problem.

When I try to classify some data and use the model (generated by training
step)  always returns unknown category.

I think the model is correct because when I'm testing the model can see the
matrix, and its result  is correct so I don't understand why does happens
it.

I put my custom classifier:


*public class CustomClassifier {*
*           *
*    private ClassifierContext context;*
*    private Algorithm algorithm;*
*    private Datastore datastore;*
*    private File modelDirectory;*
*    Analyzer analyzer;*
*    BayesParameters p;*
*           *
*    public CustomClassifier(){*
*        analyzer = new DefaultAnalyzer();*
*    }*
*
*
*    *
*    public static BayesParameters setParams() {*
*        BayesParameters bayesParams = new BayesParameters();*
*        bayesParams.setGramSize(1);*
*        bayesParams.set("dataSource", "hdfs");*
*        bayesParams.set("defaultCat", "unknown");*
*        bayesParams.set("encoding", "UTF-8");*
*        bayesParams.set("alpha_i", "1.0");*
*        *
*
*
*        return bayesParams;*
*    }*
*    *
*    *
*    public void init(File basePath) throws FileNotFoundException,
InvalidDatastoreException {*
*                     *
*        algorithm = new BayesAlgorithm();*
*        p = setParams();        *
*
*
*        p.set("basePath", basePath.getAbsolutePath());*
*        p.setGramSize(1);*
*        datastore = new InMemoryBayesDatastore(p);*
*        context = new ClassifierContext(algorithm, datastore);*
*        context.initialize();*
*                *
*    }*
*
*
*    public String classify() throws IOException, InvalidDatastoreException
{*
*        *
*        StringReader reader = new StringReader("Thanks to a reply from
someone I looked a little further and found what I was looking for.  The
April CR magazine has most of the above things. Despite recent articles
here the ratings looked pretty good for relative comparison purposes.
 Unfortunately the crash test comparisons didn't include half of the cars
I'm comparing. Anybody know how '93 Honda Civic hatchbacks and Toyota
Tercels fare in an accident? ");         *
*        *
* String[] document = BayesFileFormatter.readerToDocument(analyzer, reader);
*
*        ClassifierResult result = context.classifyDocument(document,
"unknown");*
*
*
*        return result.getLabel();*
*    }*
*
*
*
*
*
*
*    *
*    public static void main(String[] args) throws Exception {*
*        *
*        CustomClassifier cc;*
*        *
*        try {*
*            cc = new CustomClassifier();        *
*            cc.init(new File(args[0]));*
*            *
*            System.out.println("Category::: " + cc.classify());*
*        *
*        } catch (Exception e) {*
*            e.printStackTrace();*
*        }*
*        *
*        *
*    }*
*
*
*}*
*
*

I don't figure out why happens it. When I go to the hdfs, I see the folders
where the model is stored and all of them have correct structure.
If I get the labels from classifier, all of them are empty. I discovered
that when the classifier gets the model with:

*    SequenceFileModelReader.loadModel(this, params, conf);*
 *
*
*    loadFeatureWeights(datastore, new Path(params.get("sigma_j")), conf);*
*    loadLabelWeights(datastore, new Path(params.get("sigma_k")), conf);*
*    loadSumWeight(datastore, new Path(params.get("sigma_kSigma_j")), conf);
*
*    loadThetaNormalizer(datastore, new
Path(params.get("thetaNormalizer")), conf);*
*    loadWeightMatrix(datastore, new Path(params.get("weight")), conf);*


All of parts are empty but it's not true. Is there any problem with
SequenceFileDirIterable? I'm using of Cloudera distribution 3u3, so mahout
0.5 version.


Thanks

Joan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message