spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Earthson Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)
Date Thu, 14 Jan 2016 06:57:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097724#comment-15097724
] 

Earthson Lu commented on SPARK-12746:
-------------------------------------

ok, i see:)

If there's no nullability in ML, how could we implement a Transformer to fill missing values(always
represented as NULL). I think we need support nullability for Preprocessing, so we can get
clean data for further operation. I can't imagine the situation that we can do nothing when
the data contains NULL.

- - -

I think the type checking API is independent with nullability in ML. It is a common case that
one transformer accept both BooleanType or IntType. Maybe, it is a good idea that test condition
and assertions are implemented separately.

> ArrayType(_, true) should also accept ArrayType(_, false)
> ---------------------------------------------------------
>
>                 Key: SPARK-12746
>                 URL: https://issues.apache.org/jira/browse/SPARK-12746
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, SQL
>    Affects Versions: 1.6.0
>            Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has ArrayType(StringType,
true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), but it will
not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message