madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Swatisoni <...@git.apache.org>
Subject [GitHub] madlib pull request #223: Balance datasets : re-sampling technique
Date Wed, 10 Jan 2018 22:33:35 GMT
GitHub user Swatisoni opened a pull request:

    https://github.com/apache/madlib/pull/223

    Balance datasets : re-sampling technique

    JIRA:MADLIB-1168
    
    Additional Authors:
    Orhan Kislal okislal@pivotal.io
    Jingyi Mei jmei@pivotal.io
    
    Balanced datasets Phase 1 and Phase 2 implementation which performs balanced sampling
in following specified re-sampling techniques
            1. Under-sampling the majority class(es), with- and without replacement
            2. Over-sampling the minority class
            3. Combining over- and under-sampling
            	-  Uniform sampling of all classes (default case)
    	4. Create ensemble balanced sets
     		- Re-sampling given comma-delimited string of specific class and respective sample
sizes
            5. IC tests
    
    Balanced sampling with grouping functionality will be implemented in phase 3

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Swatisoni/madlib balanced_sets_final

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/223.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #223
    
----
commit 3b2d1f18b9cf5ef8f78669678d82dc29cd11812b
Author: Swatisoni <soniswati.2010@...>
Date:   2018-01-10T20:07:36Z

    Balance datasets : re-sampling technique
    
    JIRA:MADLIB-1168
    
    Additional Authors:
    Orhan Kislal okislal@pivotal.io
    Jingyi Mei jmei@pivotal.io
    
    Balanced datasets Phase 1 and Phase 2 implementation which performs balanced sampling
in following specified re-sampling techniques
            1. Under-sampling the majority class(es), with- and without replacement
            2. Over-sampling the minority class
            3. Combining over- and under-sampling
            	-  Uniform sampling of all classes (default case)
    	4. Create ensemble balanced sets
     		- Re-sampling given comma-delimited string of specific class and respective sample
sizes
            5. IC tests
    
    Balanced sampling with grouping functionality will be implemented in phase 3

----


---

Mime
View raw message