spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomas Nykodym (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23244) Incorrect handling of default values when deserializing python wrappers of scala transformers
Date Fri, 26 Jan 2018 23:22:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tomas Nykodym updated SPARK-23244:
----------------------------------
    Description: 
Default values are not handled properly when serializing/deserializing python trasnformers
which are wrappers around scala objects. It looks like that after deserialization the default
values which were based on uid do not get properly restored and values which were not set
are set to their (original) default values.

Here's a simple code example using Bucketizer:

{code:python}
>>> from pyspark.ml.feature import Bucketizer
>>> a = Bucketizer() 
>>> a.save("bucketizer0")
>>> b = load("bucketizer0") 
>>> a._defaultParamMap[a.outputCol]
u'Bucketizer_440bb49206c148989db7__output'
>>> b._defaultParamMap[b.outputCol]
u'Bucketizer_41cf9afbc559ca2bfc9a__output'
>>> a.isSet(a.outputCol)
False 
>>> b.isSet(b.outputCol)
True
>>> a.getOutputCol()
u'Bucketizer_440bb49206c148989db7__output'
>>> b.getOutputCol()
u'Bucketizer_440bb49206c148989db7__output'
{code}

  was:
Default values are not handled properly when serializing/deserializing python trasnformers
which are wrappers around scala objects. It looks like that after deserialization the default
values which were based on uid do not get properly restored and values which were not set
are set to their (original) default values.

Here's a simple code example using Bucketizer:

```\{python}

{{from pyspark.ml.feature import Bucketizer }}

{{>>> a = Bucketizer() }}

{{>>> a.save("bucketizer0") }}

{{>>> b = load("bucketizer0") }}

{{>>> a._defaultParamMap[a.outputCol]}}

u'Bucketizer_440bb49206c148989db7__output'

{{>>> b._defaultParamMap[b.outputCol] }}

u'Bucketizer_41cf9afbc559ca2bfc9a__output'

{{>>> a.isSet(a.outputCol) }}

{{False }}

{{>>> b.isSet(b.outputCol) }}

{{True}}{{}}

>>> a.getOutputCol()

u'Bucketizer_440bb49206c148989db7__output'

>>> b.getOutputCol()

u'Bucketizer_440bb49206c148989db7__output'

```


> Incorrect handling of default values when deserializing python wrappers of scala transformers
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23244
>                 URL: https://issues.apache.org/jira/browse/SPARK-23244
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.2.1
>            Reporter: Tomas Nykodym
>            Priority: Minor
>
> Default values are not handled properly when serializing/deserializing python trasnformers
which are wrappers around scala objects. It looks like that after deserialization the default
values which were based on uid do not get properly restored and values which were not set
are set to their (original) default values.
> Here's a simple code example using Bucketizer:
> {code:python}
> >>> from pyspark.ml.feature import Bucketizer
> >>> a = Bucketizer() 
> >>> a.save("bucketizer0")
> >>> b = load("bucketizer0") 
> >>> a._defaultParamMap[a.outputCol]
> u'Bucketizer_440bb49206c148989db7__output'
> >>> b._defaultParamMap[b.outputCol]
> u'Bucketizer_41cf9afbc559ca2bfc9a__output'
> >>> a.isSet(a.outputCol)
> False 
> >>> b.isSet(b.outputCol)
> True
> >>> a.getOutputCol()
> u'Bucketizer_440bb49206c148989db7__output'
> >>> b.getOutputCol()
> u'Bucketizer_440bb49206c148989db7__output'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message