Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4878A200B50 for ; Fri, 29 Jul 2016 19:52:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 471EB160A79; Fri, 29 Jul 2016 17:52:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8D261160A6E for ; Fri, 29 Jul 2016 19:52:13 +0200 (CEST) Received: (qmail 76838 invoked by uid 500); 29 Jul 2016 17:52:12 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 76827 invoked by uid 99); 29 Jul 2016 17:52:12 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Jul 2016 17:52:12 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 60B63E78B5; Fri, 29 Jul 2016 17:52:12 +0000 (UTC) From: BryanCutler To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou... Content-Type: text/plain Message-Id: <20160729175212.60B63E78B5@git1-us-west.apache.org> Date: Fri, 29 Jul 2016 17:52:12 +0000 (UTC) archived-at: Fri, 29 Jul 2016 17:52:14 -0000 Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/14308#discussion_r72833265 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMaxAbsScalerExample.java --- @@ -34,10 +44,17 @@ public static void main(String[] args) { .getOrCreate(); // $example on$ - Dataset dataFrame = spark - .read() - .format("libsvm") - .load("data/mllib/sample_libsvm_data.txt"); + List data = Arrays.asList( --- End diff -- The data in the file is fine, but uses sparse vectors so that when the result is output, it doesn't really show anything. Using just a small sample dataset, you can see what it is doing from the output before ``` +-----+--------------------+--------------------+ |label| features| scaledFeatures| +-----+--------------------+--------------------+ | 0.0|(692,[127,128,129...|(692,[127,128,129...| | 1.0|(692,[158,159,160...|(692,[158,159,160...| | 1.0|(692,[124,125,126...|(692,[124,125,126...| ``` after ``` +--------------+----------------+ | features| scaledFeatures| +--------------+----------------+ |[1.0,0.1,-8.0]|[0.25,0.01,-1.0]| |[2.0,1.0,-4.0]| [0.5,0.1,-0.5]| |[4.0,10.0,8.0]| [1.0,1.0,1.0]| ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org