From issues-return-188075-archive-asf-public=cust-asf.ponee.io@spark.apache.org  Thu Mar 29 12:11:05 2018
Return-Path: <issues-return-188075-archive-asf-public=cust-asf.ponee.io@spark.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 4CA09180645
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 29 Mar 2018 12:11:05 +0200 (CEST)
Received: (qmail 3668 invoked by uid 500); 29 Mar 2018 10:11:03 -0000
Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@spark.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@spark.apache.org>
List-Post: <mailto:issues@spark.apache.org>
List-Id: <issues.spark.apache.org>
Delivered-To: mailing list issues@spark.apache.org
Received: (qmail 3650 invoked by uid 99); 29 Mar 2018 10:11:03 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2018 10:11:03 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 33546C0591
	for <issues@spark.apache.org>; Thu, 29 Mar 2018 10:11:03 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -109.511
X-Spam-Level:
X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8,
	RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01,
	USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id HUeGVGiESak8 for <issues@spark.apache.org>;
	Thu, 29 Mar 2018 10:11:01 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 600035FAD7
	for <issues@spark.apache.org>; Thu, 29 Mar 2018 10:11:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id BC7CDE0CCE
	for <issues@spark.apache.org>; Thu, 29 Mar 2018 10:11:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3A993255F2
	for <issues@spark.apache.org>; Thu, 29 Mar 2018 10:11:00 +0000 (UTC)
Date: Thu, 29 Mar 2018 10:11:00 +0000 (UTC)
From: "Furcy Pin (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13065016.1492589196000.138346.1522318260238@Atlassian.JIRA>
In-Reply-To: <JIRA.13065016.1492589196000@Atlassian.JIRA>
References: <JIRA.13065016.1492589196000@Atlassian.JIRA> <JIRA.13065016.1492589196200@jira-lw-us.apache.org>
Subject: [jira] [Comment Edited] (SPARK-20384) supporting value classes over
 primitives in DataSets
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/SPARK-20384?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D164=
18691#comment-16418691 ]=20

Furcy Pin edited comment on SPARK-20384 at 3/29/18 10:10 AM:
-------------------------------------------------------------

+1 on this issue.

I think the generic use case is that the spark Encoder magic to automatical=
ly=C2=A0transform a DataFrame into a case class currently only work for bas=
e types.

This is great if you have a=C2=A0
{code:java}
case class Table(id: Long, attribute: String)
{code}
with=C2=A0simple attributes,

=C2=A0

BUT,=C2=A0if you want to wrap your attribute into another simple class like=
 this
{code:java}
case class Attribute(value: String) {
  // some specific methods...
}
case class Table(id: Long, attribute: Attribute){code}
Then this won't work automatically, unless the "attribute" column in your D=
ataFrame is a struct itself.

=C2=A0

The problem is that currently there doesn't seem to be any simple way to ac=
hieve this, which really limits the usefulness of the whole Encoder magic.=
=C2=A0

And if a nice, simple way to achieve this exists, please document it as I d=
id not find it.

=C2=A0

=C2=A0EDIT: after giving it some thought, I tried to do this:
{code:java}
implicit class Attribute(value: String)
case class Table(id: Long, attribute: Attribute){code}
But=C2=A0it does not work either.=C2=A0If it were possible like this, it wo=
uld be a nice way to do it.

=C2=A0

=C2=A0


was (Author: fpin):
+1 on this issue.


 I think the generic use case is that the spark Encoder magic to automatica=
lly=C2=A0transform a DataFrame into a case class currently only work for ba=
se types.

This is great if you have a=C2=A0
{code:java}
case class Table(id: Long, attribute: String)
{code}
with=C2=A0simple attributes,

=C2=A0

BUT,=C2=A0if you want to wrap your attribute into another simple class like=
 this
{code:java}
case class Attribute(value: String) {
  // some specific methods...
}
case class Table(id: Long, attribute: Attribute){code}
Then this won't work automatically, unless the "attribute" column in your D=
ataFrame is a struct itself.

=C2=A0

The problem is that currently there doesn't seem to be any simple way to ac=
hieve this, which really limits the usefulness of the whole Encoder magic.=
=C2=A0

And if a nice, simple way to achieve this exists, please document it as I d=
id not find it.

=C2=A0

=C2=A0

> supporting value classes over primitives in DataSets
> ----------------------------------------------------
>
>                 Key: SPARK-20384
>                 URL: https://issues.apache.org/jira/browse/SPARK-20384
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Daniel Davis
>            Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain obje=
cts, I also would like to make use of them for datasets.=20
> For example, I would like to use the {{User}} case class which is using a=
 value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if de=
fined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
>   def toString: String =3D value.toHexString
> }
> case class User(id: Id, name: String)
> val ds =3D spark.sparkContext
>   .parallelize(0L to 12L).map(i =3D> (i, f"name-$i")).toDS()
>   .withColumnRenamed("_1", "id")
>   .withColumnRenamed("_2", "name")
> // mapping should work
> val usrs =3D ds.as[User]
> // show should use toString
> usrs.show()
> // comparison with long should throw exception, as not defined on Id
> usrs.col("id") > 0L
> {code}
> For example `.show()` should use the toString of the `Id` value class:
> {noformat}
> +---+-------+
> | id|   name|
> +---+-------+
> |  0| name-0|
> |  1| name-1|
> |  2| name-2|
> |  3| name-3|
> |  4| name-4|
> |  5| name-5|
> |  6| name-6|
> |  7| name-7|
> |  8| name-8|
> |  9| name-9|
> |  A|name-10|
> |  B|name-11|
> |  C|name-12|
> +---+-------+
> {noformat}


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org