hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From takuti <...@git.apache.org>
Subject [GitHub] incubator-hivemall pull request #52: [HIVEMALL-78] Implement AUC UDAF for bi...
Date Tue, 28 Feb 2017 06:23:02 GMT
GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/52

    [HIVEMALL-78] Implement AUC UDAF for binary classification

    ## What changes were proposed in this pull request?
    
    In addition to current `auc(array, array)` for ranking (myui/hivemall#326), this patch
supports `auc(double, double)` for binary classification.
    
    ## What type of PR is it?
    
    Feature
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-78
    
    ## How was this patch tested?
    
    Created unit test for the UDAF, and passed:
    
    ```
    $ mvn -Dtest=hivemall.evaluation.AUCUDAFTest test
    ```
    
    Moreover, I have launched manual tests by the following queries:
    
    ```sql
    with data as (
      select 0.5 as prob, 0 as label
      union all
      select 0.3 as prob, 1 as label
      union all
      select 0.2 as prob, 0 as label
      union all
      select 0.8 as prob, 1 as label
      union all
      select 0.7 as prob, 1 as label
    ), data_ordered as (
      select prob, label
      from data
      order by prob desc
    )
    select auc(prob, label)
    from (
      select prob, label
      from data_ordered
      distribute by floor(prob / 0.2)
    ) t;
    ```
    
    ```sql
    with data as (
      select 0.5 as prob, 0 as label
      union all
      select 0.3 as prob, 1 as label
      union all
      select 0.2 as prob, 0 as label
      union all
      select 0.8 as prob, 1 as label
      union all
      select 0.7 as prob, 1 as label
    ), data_ordered as (
      select prob, label
      from data
      order by prob desc
    )
    select auc(prob, label)
    from data_ordered;
    ```
    
    Both showed `AUC=0.83333`. This result is same as [scikit-learn's roc_auc_score()](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html):
    
    ```
    >>> roc_auc_score([0,1,0,1,1],[0.5,0.3,0.2,0.8,0.7])
    0.83333333333333326
    ```
    
    ## How to use this feature?
    
    See above queries. Input data needs to be ordered by scores in a descending order.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall auc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/52.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #52
    
----
commit e60ff231e07aa515666ec7f4863ed1c8401e0e27
Author: Takuya Kitazawa <k.takuti@gmail.com>
Date:   2017-02-28T06:08:33Z

    Implement AUCUDAF

commit 4756f463700740af0bd51ab7a25e383649a2d504
Author: Takuya Kitazawa <k.takuti@gmail.com>
Date:   2017-02-28T06:09:18Z

    Add unit test of AUCUDAF for classification

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message