Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0E05F200D67 for ; Sun, 24 Dec 2017 08:21:11 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 0C9F8160C1D; Sun, 24 Dec 2017 07:21:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 52DDB160C0B for ; Sun, 24 Dec 2017 08:21:10 +0100 (CET) Received: (qmail 58715 invoked by uid 500); 24 Dec 2017 07:21:09 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 58704 invoked by uid 99); 24 Dec 2017 07:21:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Dec 2017 07:21:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E74801A03AE for ; Sun, 24 Dec 2017 07:21:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id mDLhOUr7pSiD for ; Sun, 24 Dec 2017 07:21:08 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 6DA525F201 for ; Sun, 24 Dec 2017 07:21:07 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9A6DAE041F for ; Sun, 24 Dec 2017 07:21:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 943F6240E2 for ; Sun, 24 Dec 2017 07:21:02 +0000 (UTC) Date: Sun, 24 Dec 2017 07:21:00 +0000 (UTC) From: "Joseph K. Bradley (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-8418) Add single- and multi-value support to ML Transformers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 24 Dec 2017 07:21:11 -0000 [ https://issues.apache.org/jira/browse/SPARK-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302738#comment-16302738 ] Joseph K. Bradley commented on SPARK-8418: ------------------------------------------ One more thought: Looking at existing PRs and docs for inputCols & outputCols, I'm worried it may be unclear to users how to use multi-column APIs. E.g., if OneHotEncoderEstimator (or any of the others) have docs talking about transforming a Numeric column to a Vector column, then users may be confused about whether each inputCol is treated independently, all concatenated in the output, or what. I'm commenting on the OHE PR but thought this was relevant to all of these PRs. > Add single- and multi-value support to ML Transformers > ------------------------------------------------------ > > Key: SPARK-8418 > URL: https://issues.apache.org/jira/browse/SPARK-8418 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Joseph K. Bradley > > It would be convenient if all feature transformers supported transforming columns of single values and multiple values, specifically: > * one column with one value (e.g., type {{Double}}) > * one column with multiple values (e.g., {{Array[Double]}} or {{Vector}}) > We could go as far as supporting multiple columns, but that may not be necessary since VectorAssembler could be used to handle that. > Estimators under {{ml.feature}} should also support this. > This will likely require a short design doc to describe: > * how input and output columns will be specified > * schema validation > * code sharing to reduce duplication -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org