mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Isabel Drost (JIRA)" <>
Subject [jira] Commented: (MAHOUT-124) Online Classification using HBase
Date Tue, 07 Jul 2009 19:26:14 GMT


Isabel Drost commented on MAHOUT-124:

Some initial comments on the patch:

org/apache/mahout/utils/ - I am missing some documentation for the methods. For
interfaces, you can omit the public with methods. For classes implementing this interface,
you might want to at least use @inheritDoc to link back to the original documentation. Please
also note in the class comment whether your implementation is safe to use in a multi-threaded
context or not.

org.apache.mahout.common.Model - To me it looks a bit weird to add a dependency to HBase directly
to the model. I would prefer the HBase implementation to be less tightly coupled with the
core code. Currently it looks like the model is really doing two tasks at once: Implementing
an in-memory-model as well as an HBase model. I think it should be possible to refactor the
code such that the two can be separated into distinct classes that can then be used interchangeably.
My first guess would be that the strategy pattern should be helpful with this task. 

You probably will have to refactor CBayesModel and BayesModel as well. The same applies to
org/apache/mahout/classifier/ and CBayesModel, Model, BayesTfIdfDriver, BayesTfIDFReducer,

org.apache.mahout.classifier.cbase - I really like your additions for reporting progress back
to Hadoop. I would suggest to split these from the patch, open a separate Issue and attach
the changes there. This would keep this patch more focussed on the original task of adding
HBase support.

org.apache.mahout.classifier.cbase.CBayesModel - Please remove the code you commented out
if you do not need it anymore. In case of catching an IOException you should at least write
some warning log message (e.g. line 60). 

> Online Classification using HBase
> ---------------------------------
>                 Key: MAHOUT-124
>                 URL:
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>         Attachments: MAHOUT-124-July-6.patch, MAHOUT-124-June-23.patch
> #       Batch classification of flat file documents and flat file model:
> #       Storing the model in HBase and the end of Model Building Map/Reduce stages
> #       Using the model stored in HBase create an interface (both command line and web
service) to classify a give document
> #       Using the model stored in HBase, batch classify documents stored on the HDFS

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message