From dev-return-3487-archive-asf-public=cust-asf.ponee.io@madlib.apache.org Tue Jul 10 23:23:49 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 140BA180634 for ; Tue, 10 Jul 2018 23:23:48 +0200 (CEST) Received: (qmail 57291 invoked by uid 500); 10 Jul 2018 21:23:48 -0000 Mailing-List: contact dev-help@madlib.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@madlib.apache.org Delivered-To: mailing list dev@madlib.apache.org Received: (qmail 57280 invoked by uid 99); 10 Jul 2018 21:23:47 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2018 21:23:47 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 6EBA4DFA6D; Tue, 10 Jul 2018 21:23:47 +0000 (UTC) From: iyerr3 To: dev@madlib.apache.org Reply-To: dev@madlib.apache.org References: In-Reply-To: Subject: [GitHub] madlib pull request #289: RF: Add impurity variable importance Content-Type: text/plain Message-Id: <20180710212347.6EBA4DFA6D@git1-us-west.apache.org> Date: Tue, 10 Jul 2018 21:23:47 +0000 (UTC) Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/289#discussion_r201498820 --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.py_in --- @@ -1333,42 +1368,69 @@ def _create_group_table( # ------------------------------------------------------------------------- -def _create_empty_result_table(schema_madlib, output_table_name): +def _create_empty_result_table(schema_madlib, output_table_name, importance): """Create the result table for all trees in the forest""" + impurity_var_imp_str = """, impurity_var_importance double precision[]);""" if importance else ");" + sql_create_empty_result_table = """ CREATE TABLE {output_table_name} ( gid integer, sample_id integer, - tree {schema_madlib}.bytea8); + tree {schema_madlib}.bytea8 + {impurity_var_imp_str} """.format(**locals()) plpy.notice("sql_create_empty_result_table:\n" + sql_create_empty_result_table) plpy.execute(sql_create_empty_result_table) # ------------------------------------------------------------ def _insert_into_result_table(schema_madlib, tree_states, output_table_name, - grp_key_to_grp_cols, sample_id): + grp_key_to_grp_cols, sample_id, importance, grouping_cols): """Insert one tree to result table""" + + impurity_var_imp_str = '' + importance_query = '' + importance_results = '' --- End diff -- IMO these variables in else clause is easier to read. Also `importance_results` is limited to the `if` block and does not require to be defined outside that scope. ---