From dev-return-2733-archive-asf-public=cust-asf.ponee.io@madlib.apache.org Wed Jan 24 20:48:36 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 165E318076D for ; Wed, 24 Jan 2018 20:48:36 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 07380160C3C; Wed, 24 Jan 2018 19:48:36 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 23C60160C39 for ; Wed, 24 Jan 2018 20:48:34 +0100 (CET) Received: (qmail 47808 invoked by uid 500); 24 Jan 2018 19:48:34 -0000 Mailing-List: contact dev-help@madlib.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@madlib.apache.org Delivered-To: mailing list dev@madlib.apache.org Received: (qmail 47642 invoked by uid 99); 24 Jan 2018 19:48:33 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Jan 2018 19:48:33 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id E9784DFA44; Wed, 24 Jan 2018 19:48:31 +0000 (UTC) From: njayaram2 To: dev@madlib.apache.org Reply-To: dev@madlib.apache.org References: In-Reply-To: Subject: [GitHub] madlib pull request #225: Added option for weighted average for both classif... Content-Type: text/plain Message-Id: <20180124194831.E9784DFA44@git1-us-west.apache.org> Date: Wed, 24 Jan 2018 19:48:31 +0000 (UTC) Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/225#discussion_r163653414 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -212,22 +244,27 @@ def knn(schema_madlib, point_source, point_column_name, point_id, WHERE {y_temp_table}.r <= {k_val} """.format(**locals())) - plpy.execute( - """ + plpy.execute(""" CREATE TABLE {output_table} AS - SELECT {test_id_temp} AS id, {test_column_name} + {view_def} + SELECT knn_temp.{test_id_temp} AS id , + knn_test.data {pred_out} --- End diff -- This `pred_out` doesn't seem right for classification with weighted averaging. Without weighted averaging, we just get the mode as the class predicted. But, with weighted averaging, we must present the class corresponding to the one with the highest weighted sum as the prediction value, and not the highest weighted sum itself. We should also take multi-class scenario into account while changing this. ---