spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mengxr <...@git.apache.org>
Subject [GitHub] spark pull request: MLI-1 Decision Trees
Date Mon, 10 Mar 2014 19:34:26 GMT
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/79#issuecomment-37224328
  
    @manishamde Thanks for updating the code style and adding more docs! I made a first pass
over the code.
    
    For the code style, we do not have a good style checker for Scala. @rxin can tell more
about style checking. However, it is easy to learn Spark's code style through the code review
and make your code style consistent in the next update. Please see my comments for some examples
and update similar code in other places.
    
    For the implementation, I have the following suggestions:
    
    1. Regression or Classification is checked in many places. It would be nice to create
a DecisionTree base class and make RegressionTree and ClassificationTree two subclasses of
it.
    
    2. For loops are used in some performance critical code. This should be replaced by "while",
which is much faster than "for" in Scala.
    
    3. Several nested methods are used in findBestSplits. It feels safe to see some unit tests
for them.
    
    4. The threshold for classification is set at 0.5. This should be configurable.
    
    I will try to make a second pass focusing on the algorithm later today. In the meanwhile,
would you please update the remaining code style problems and the for loops? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message