madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From njayaram2 <...@git.apache.org>
Subject [GitHub] madlib pull request #225: Added option for weighted average for both classif...
Date Wed, 24 Jan 2018 19:48:31 GMT
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/225#discussion_r163653414
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -212,22 +244,27 @@ def knn(schema_madlib, point_source, point_column_name, point_id,
                 WHERE {y_temp_table}.r <= {k_val}
                 """.format(**locals()))
     
    -        plpy.execute(
    -            """
    +        plpy.execute("""
                 CREATE TABLE {output_table} AS
    -                SELECT {test_id_temp} AS id, {test_column_name}
    +                {view_def}
    +                SELECT knn_temp.{test_id_temp} AS id ,
    +                    knn_test.data
                         {pred_out}
    --- End diff --
    
    This `pred_out` doesn't seem right for classification with weighted averaging. Without
weighted averaging, we just get the mode as the class predicted. But, with weighted averaging,
we must present the class corresponding to the one with the highest weighted sum as the prediction
value, and not the highest weighted sum itself.
    We should also take multi-class scenario into account while changing this.


---

Mime
View raw message