spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From crackcell <...@git.apache.org>
Subject [GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...
Date Fri, 10 Mar 2017 05:06:38 GMT
GitHub user crackcell opened a pull request:

    https://github.com/apache/spark/pull/17233

    [SPARK-11569][ML] Fix StringIndexer to handle null value properly

    ## What changes were proposed in this pull request?
    
    This PR is to enhance StringIndexer with NULL values handling.
    
    Before the PR, StringIndexer will throw an exception when encounters NULL values.
    With this PR:
    - handleInvalid=error: Throw an exception as before
    - handleInvalid=skip: Skip null values as well as unseen labels
    - handleInvalid=keep: Give null values an additional index as well as unseen labels
    
    BTW, I noticed someone was trying to solve the same problem ( #9920 ) but seems getting
no progress or response for a long time. Would you mind give a chance to solve it ?
    
    ## How was this patch tested?
    
    new unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/crackcell/spark 11569_StringIndexer_NULL

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17233
    
----
commit 75e3975597aa6271f4f8ab688922edda88b03045
Author: Menglong TAN <tanmenglong@gmail.com>
Date:   2017-03-08T03:50:17Z

    Merge pull request #1 from apache/master
    
    merge master to my repo

commit 79d706085e8371fb1724ce73377767c38d551e5d
Author: Menglong TAN <tanmenglong@renrenche.com>
Date:   2017-03-10T04:45:56Z

    Enhance StringIndexer with NULL values

commit 0cb121c65f592b9623bdeef2746d7c2a3c281ae1
Author: Menglong TAN <tanmenglong@renrenche.com>
Date:   2017-03-10T04:52:30Z

    filter out NULLs when transform dataset

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message