spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mengxr <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-8341] Significant selector feature tran...
Date Fri, 17 Jul 2015 07:21:39 GMT
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/6795#issuecomment-122201258
  
    @catap Some high-level comments:
    
    1. The name `SignificantSelector` does not reflect what it really does. There are many
ways to define feature significance. It is hard to guess that this one means removing constant
columns.
    2. The example you mentioned doesn't fully support this transformer. If you don't want
zero columns, use bag of words. If there are many values and you have to use hashing, this
transformer doesn't really help reduce the number of columns.
    
    I don't see how this transformer could help real-world use cases. But if you have seen
reference use cases, please let me know. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message