madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From njayaram2 <...@git.apache.org>
Subject [GitHub] madlib pull request #225: Added option for weighted average for both classif...
Date Tue, 16 Jan 2018 23:49:32 GMT
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/225#discussion_r161918108
  
    --- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
    @@ -326,6 +331,39 @@ Result, with neighbors sorted from closest to furthest:
     (6 rows)
     </pre>
     
    +
    +-#   Run KNN for classification using the 
    +weighted average:
    +<pre class="example">
    +DROP TABLE IF EXISTS knn_result_classification;
    +SELECT * FROM madlib.knn(
    +                'knn_train_data',      -- Table of training data
    +                'data',                -- Col name of training data
    +                'id',                  -- Col name of id in train data
    +                'label',               -- Training labels
    +                'knn_test_data',       -- Table of test data
    +                'data',                -- Col name of test data
    +                'id',                  -- Col name of id in test data
    +                'knn_result_classification',  -- Output table
    +                 3,                    -- Number of nearest neighbors
    +                 True,                 -- True to list nearest-neighbors by id
    +                 'madlib.squared_dist_norm2', -- Distance function
    +                 True                 -- For weighted average
    +                );
    +SELECT * FROM knn_result_classification ORDER BY id;
    +</pre>
    +<pre class="result">
    + id |  data   |     prediction      | k_nearest_neighbours 
    +----+---------+---------------------+----------------------
    +  1 | {2,1}   |                 2.2 | {1,2,3}
    +  2 | {2,6}   |               0.425 | {3,4,5}
    +  3 | {15,40} |  0.0174339622641509 | {5,6,7}
    +  4 | {12,1}  |  0.0379633360193392 | {3,4,5}
    +  5 | {2,90}  | 0.00306428140577315 | {6,7,9}
    +  6 | {50,45} | 0.00214165229166379 | {6,7,8}
    +(6 rows)
    +</pre>
    +
    --- End diff --
    
    I got the following error for this example (was running on Greenplum 5):
    ```
    greenplum=# DROP TABLE IF EXISTS knn_result_classification;
    NOTICE:  table "knn_result_classification" does not exist, skipping
    DROP TABLE
    greenplum=# SELECT * FROM madlib.knn(
    greenplum(#                 'knn_train_data',      -- Table of training data
    greenplum(#                 'data',                -- Col name of training data
    greenplum(#                 'id',                  -- Col name of id in train data
    greenplum(#                 'label',               -- Training labels
    greenplum(#                 'knn_test_data',       -- Table of test data
    greenplum(#                 'data',                -- Col name of test data
    greenplum(#                 'id',                  -- Col name of id in test data
    greenplum(#                 'knn_result_classification',  -- Output table
    greenplum(#                  3,                    -- Number of nearest neighbors
    greenplum(#                  True,                 -- True to list nearest-neighbors by
id
    greenplum(#                  'madlib.squared_dist_norm2', -- Distance function
    greenplum(#                  True                 -- For weighted average
    greenplum(#                 );
    ERROR:  plpy.SPIError: function expression in FROM cannot refer to other relations of
same query level
    LINE 15:                             a , unnest(k_nearest_neighbours)...
                                                    ^
    QUERY:
                    CREATE TABLE knn_result_classification AS
                        SELECT id, data ,max(prediction) as prediction
                            , array_agg(distinct k_neighbours) AS k_nearest_neighbours
                        FROM
                            ( SELECT __madlib_temp_test_id_temp29900589_1516144312_53639332__
AS id, data
                                    ,sum(1/dist) AS prediction
                                    , array_agg(knn_temp.train_id ORDER BY knn_temp.dist ASC)
AS k_nearest_neighbours
                                FROM pg_temp.__madlib_temp_interim_table75130626_1516144312_10216040__
AS knn_temp
                                    JOIN
                                    knn_test_data AS knn_test ON
                                    knn_temp.__madlib_temp_test_id_temp29900589_1516144312_53639332__
= knn_test.id
                                GROUP BY __madlib_temp_test_id_temp29900589_1516144312_53639332__
,
                                    data, __madlib_temp_label_col_temp66682446_1516144312_5242078__)
                                a , unnest(k_nearest_neighbours) as k_neighbours
                        GROUP BY id, data
    
    CONTEXT:  Traceback (most recent call last):
      PL/Python function "knn", line 36, in <module>
        weighted_avg
      PL/Python function "knn", line 242, in knn
    PL/Python function "knn"
    ```
    
    This might be because some functions/features available in Postgres-9.x are not available
in Greenplum. So we should use functions that would work on both.



---

Mime
View raw message