spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Franklyn D'souza" <franklyn.dso...@shopify.com>
Subject Nulls getting converted to 0 with spark 2.0 SNAPSHOT
Date Mon, 07 Mar 2016 19:30:01 GMT
Just wanted to confirm that this is the expected behaviour.

Basically I'm putting nulls into a non-nullable LongType column and doing a
transformation operation on that column, the result is a column with nulls
converted to 0.

Heres an example

from pyspark.sql import types
from pyspark.sql import DataFrame, types, functions as F

sql_schema = types.StructType([
  types.StructField("a", types.LongType(), True),
  types.StructField("b", types.StringType(),  True),
])

df = sqlCtx.createDataFrame([
    (1, "one"),
    (None, "two"),
], sql_schema)

*# Everything is fine here*
*df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]*

def assert_not_null(val):
    return val

udf = F.udf(assert_not_null, types.LongType())

df = df.withColumnRenamed('a', "_tmp_col")
df = df.withColumn('a', udf(df._tmp_col))
df = df.drop("_tmp_col")

*# None gets converted to 0*
*df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]*

Thanks,

Franklyn

Mime
View raw message