spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Assigned] (SPARK-19843) UTF8String => (int / long) conversion expensive for invalid inputs
Date Tue, 07 Mar 2017 02:39:33 GMT


Apache Spark reassigned SPARK-19843:

    Assignee: Apache Spark

> UTF8String => (int / long) conversion expensive for invalid inputs
> ------------------------------------------------------------------
>                 Key: SPARK-19843
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Tejas Patil
>            Assignee: Apache Spark
> In case of invalid inputs, converting a UTF8String to int or long returns null. This
comes at a cost wherein the method for conversion (e.g [0]) would throw an exception. Exception
handling is expensive as it will convert the UTF8String into a java string, populate the stack
trace (which is a native call). While migrating workloads from Hive -> Spark, I see that
this at an aggregate level affects the performance of queries in comparison with hive.
> The exception is just indicating that the conversion failed.. its not propagated to users
so it would be good to avoid.
> Couple of options:
> - Return Integer / Long (instead of primitive types) which can be set to NULL if the
conversion fails. This is boxing and super bad for perf so a big no.
> - Hive has a pre-check [1] for this which is not a perfect safety net but helpful to
capture typical bad inputs eg. empty string, "null".
> [0] :
> [1] :

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message