spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-7921) Change includeFirst to dropLast in OneHotEncoder
Date Thu, 28 May 2015 19:06:19 GMT
Xiangrui Meng created SPARK-7921:
------------------------------------

             Summary: Change includeFirst to dropLast in OneHotEncoder
                 Key: SPARK-7921
                 URL: https://issues.apache.org/jira/browse/SPARK-7921
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 1.4.0
            Reporter: Xiangrui Meng
            Assignee: Xiangrui Meng


Change includeFirst to dropLast and leave the default to true. There are couple benefits:

a. consistent with other tutorials of one-hot encoding (or dummy coding) (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm)
b. keep the indices unmodified in the output vector. If we drop the first, all indices will
be shifted by 1.
c. If users use StringIndex, the last element is the least frequent one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message