spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-20619) StringIndexer supports multiple ways of label ordering
Date Fri, 12 May 2017 07:14:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Felix Cheung resolved SPARK-20619.
----------------------------------
          Resolution: Fixed
            Assignee: Wayne Zhang
       Fix Version/s: 2.3.0
    Target Version/s: 2.3.0

> StringIndexer supports multiple ways of label ordering
> ------------------------------------------------------
>
>                 Key: SPARK-20619
>                 URL: https://issues.apache.org/jira/browse/SPARK-20619
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Wayne Zhang
>            Assignee: Wayne Zhang
>             Fix For: 2.3.0
>
>
> StringIndexer maps labels to numbers according to the descending order of label frequency.
Other types of ordering (e.g., alphabetical) may be needed in feature ETL. For example, the
ordering will affect the result in one-hot encoding and RFormula. Propose to support other
ordering methods and we add a parameter stringOrderType that supports the following four options:
>    - 'freq_desc': descending order by label frequency (most frequent label assigned 0)
>    - 'freq_asc': ascending order by label frequency (least frequent label assigned 0)
>    - 'alphabet_desc': descending alphabetical order
>    - 'alphabet_asc': ascending alphabetical order



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message