GitHub user njayaram2 opened a pull request:
https://github.com/apache/madlib/pull/230
Balanced sets final
Refactor code, and add keep_null parameter.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/njayaram2/madlib balanced_sets_final
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/230.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #230
----
commit 87f6ffa4c9d1fcfafda5735adc7b76561dec6d9b
Author: Swatisoni <soniswati.2010@...>
Date: 2018-01-10T20:07:36Z
Balance datasets : re-sampling technique
JIRA:MADLIB-1168
Additional Authors:
Orhan Kislal okislal@pivotal.io
Jingyi Mei jmei@pivotal.io
Balanced datasets Phase 1 and Phase 2 implementation which performs balanced sampling
in following specified re-sampling techniques
1. Under-sampling the majority class(es), with- and without replacement
2. Over-sampling the minority class
3. Combining over- and under-sampling
- Uniform sampling of all classes (default case)
4. Create ensemble balanced sets
- Re-sampling given comma-delimited string of specific class and respective sample
sizes
5. IC tests
Balanced sampling with grouping functionality will be implemented in phase 3
commit 40d1275504a107e7ae8809ab7f37f0aaa8ed0799
Author: Jingyi Mei <jmei@...>
Date: 2018-01-12T00:33:51Z
troubleshoot float issue
commit 01ade3c8dbc108237aec0866060f6ee5acaacaac
Author: Jingyi Mei <jmei@...>
Date: 2018-01-12T17:54:18Z
troubleshoot float issue 2
commit 276b3b8628488eaa281688e5115658cc8318abfa
Author: Jingyi Mei <jmei@...>
Date: 2018-01-22T19:25:07Z
Wip for refactor
commit f97e2742bb5a0150328798db064c3ab21c335def
Author: Rahul Iyer <riyer@...>
Date: 2018-01-23T01:13:16Z
Refactor sampling strategy and class counts
commit 445cbfe4d79c48c103b36f1d7bafdd171afb390b
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-23T02:04:40Z
refactor WIP
commit a0db130061bf23e857ae85d602db7f6937c71c58
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-23T20:50:40Z
update some code and add unit test cases for it
commit c15c5b537aaf233b6da906a988f8f7257fb9e83c
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-23T23:31:08Z
handle all cases in _get_target_class_sizes
commit 1e4165eae46200109219920df276774c2b44ec29
Author: Rahul Iyer <riyer@...>
Date: 2018-01-24T01:26:59Z
Refactor get_target_sizes function
commit 36e4b75edc48ed3e7c7f64ad1944b0d12db191a4
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-24T18:29:12Z
add comments, rename some variable, and undo changes to utilities.py_in from a previous
commit
commit e643e6d8a70ad211152a6a41b64e6b35eee31f32
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-24T23:26:06Z
done with creating strategy specific count dict, and subquery for no sample
commit b50a13330e3b7ceff71165f1385d7117f0f1a047
Author: Rahul Iyer <riyer@...>
Date: 2018-01-25T21:42:46Z
Add with_replacement subquery generation
commit b37a775af9e816cded3e3f35b8ada3bb8e9fbcf0
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-26T00:29:23Z
most coding done???? yet to test.
commit 02113b98b670e0118d1766215f8d9619d951c2d3
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-26T19:02:29Z
fix issues wip
commit 714db2a68afc325292748b3146afc07d81ab813e
Author: Rahul Iyer <riyer@...>
Date: 2018-01-26T20:26:23Z
Fix some errors to pass IC, update docs
commit 1dc6c0600b171a475b74e19fa5d39fbd070313ec
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-27T01:27:46Z
replace Poisson based sampling with row_number()
commit d67a4f6e926aa1a5189716bb3f06ab53ef0d0cc4
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-30T01:35:02Z
add new param for keep_null, code to handle that scenario, and some install check test
cases to test it
commit cc8f6cd8816470c68df9beccc6141fa2fad4a62c
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-30T20:00:39Z
Add more validations, test cases, and rename a function
commit f4d02c67a106dea902a5926fe2cc266ab9d44e0f
Author: Nandish Jayaram <njayaram@...>
Date: 2018-01-30T22:35:30Z
reverting changes to stratified_sample.sql_in
----
---
|