spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Preece (JIRA)" <>
Subject [jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6
Date Tue, 22 Dec 2015 14:34:46 GMT


Tim Preece commented on SPARK-12319:

Hi Michael,
I think this may be a problem with the new DataSet API, in particular the new "as" function
of DataFrame which I see is tagged as Experimental.

When we run the DatasetAggregatorSuite test "typed aggregation: class input with reordering"
the implementation seems to get confused between the ordering of the data in the unsaferow
(string,int) and the schema (int,string). This results in a testcase failure that shows up
to BE platforms ( although the data is also corrupted on LE platforms ).

At the moment I'm not sure how to fix, so any pointers would be helpful.

> Address endian specific problems surfaced in 1.6
> ------------------------------------------------
>                 Key: SPARK-12319
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: Problems apparent on BE, LE could be impacted too
>            Reporter: Adam Roberts
>            Priority: Critical
> JIRA to cover endian specific problems - since testing 1.6 I've noticed problems with
DataFrames on BE platforms, e.g.
> [~joshrosen] [~yhuai]
> Current progress: using and
within UnsafeRowSerializer fixes three test failures in ExchangeCoordinatorSuite but I'm concerned
around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input with reordering"
fails as we expect "one, 1" but instead get "one, 9" - we believe the issue lies within,
specifically around: return (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word);

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message