spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mengxr <...@git.apache.org>
Subject [GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...
Date Mon, 04 Aug 2014 16:58:41 GMT
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1484#issuecomment-51086836
  
    @avulanov I have the same concern about calling `transform` before `fit`. There are two
options: 1) throw an error, 2) fit on the same dataset and then transform (fit_transform in
sk-learn). But I don't have a strong preference of either one.
    
    I want to add another candidate to what you proposed:
    
    ~~~
    class ChiSquaredFeatureSelection {
       def fit(dataset: RDD[LabeledPoint], numFeatures: Int): ChiSquaredFeatureSelector
    }
    
    class ChiSquaredFeatureSelector {
      def transform(dataset: RDD[LabeledPoint]): RDD[LabeledPoint]
    }
    ~~~
    
    We can discuss the class hierarchy later since they are not user-facing.
    
    A problem with all the candidates here is we cannot apply the same transformation on `RDD[Vector]`,
which is required for prediction. I'm thinking about something like the following:
    
    ~~~
    class ChiSquaredFeatureSelection {
       def fit[T <: Vectorized with Labeled](dataset: RDD[T], numFeatures: Int): ChiSquaredFeatureSelector
    }
    
    class ChiSquaredFeatureSelector {
      def transform[T <: Vectorized](dataset: RDD[T]): RDD[T]
    }
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message