madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kaknikhil <...@git.apache.org>
Subject [GitHub] madlib pull request #281: Support special characters in MLP, minibatch prepr...
Date Thu, 21 Jun 2018 18:05:48 GMT
GitHub user kaknikhil opened a pull request:

    https://github.com/apache/madlib/pull/281

    Support special characters in MLP, minibatch preprocessor and encode_categorical

    Support special characters in MLP, minibatch preprocessor and encode_categorical
    
    JIRA: MADLIB-1237
    JIRA: MADLIB-1238
    JIRA: MADLIB-1238
    JIRA: MADLIB-1243
    
    The module that needs to support special characters will have to call
    quote_literal() on all the column values that need to be escaped and
    quoted and then this list can be passed to the py_list_to_sql_string
    function
    
    We also created a function called get_distinct_col_levels which will
    call quote_literal and then return a list of escaped column levels. The
    output of this function can then be safely passed to
    py_list_to_sql_string with long_format set as True.
    
    Co-Authored-by: Jingyi Mei <jmei@pivotal.io>
    Co-Authored-by: Rahul Iyer <riyer@apache.org>
    Co-Authored-by: Arvind Sridhar <asridhar@pivotal.io>


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib bug_minibatch_preprocessor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #281
    
----
commit 4713b24eac1c27ba09cd1152e502b02bb1e13da4
Author: Jingyi Mei <jmei@...>
Date:   2018-05-23T23:29:54Z

    MLP+Minibatch Preprocessing: Support special characters
    
    JIRA: MADLIB-1237
    JIRA: MADLIB-1238
    
    This commit enables special character support for column names and
    column values for mlp and minibatch preprocessor. We decided to use the
    following strategy for supporting special characters
    
    The module that needs to support special characters will have to call
    quote_literal() on all the column values that need to be escaped and
    quoted and then this list can be passed to the py_list_to_sql_string
    function
    
    We also created a function called get_distinct_col_levels which will
    call quote_literal and then return a list of escaped column levels. The
    output of this function can then be safely passed to
    py_list_to_sql_string with long_format set as True.
    
    Co-Authored-by: Jingyi Mei <jmei@pivotal.io>
    Co-Authored-by: Rahul Iyer <riyer@apache.org>
    Co-Authored-by: Arvind Sridhar <asridhar@pivotal.io>

commit d24cdfe1dbdcfe8ba2379a70a52cafeeba994c0e
Author: Arvind Sridhar <asridhar@...>
Date:   2018-05-24T00:02:43Z

    Encode categorical variables: handling special characters
    
    JIRA: MADLIB-1238
    JIRA: MADLIB-1243
    
    This commit deals with special characters in column name and column
    values. Also adds install check test cases to cover these scenarios.
    
    Co-Authored-by: Jingyi Mei <jmei@pivotal.io>
    Co-Authored-by: Arvind Sridhar <asridhar@pivotal.io>

commit 262e796a9cb17c612c1e844ee7354be1abd11f5d
Author: Nikhil Kak <nkak@...>
Date:   2018-06-18T21:30:20Z

    Cleanup: Remove unnecessary unit tests.
    
    All the unit tests in utilties.py_in were moved to test_utilities.py_in.

----


---

Mime
View raw message