spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davies Liu (JIRA)" <>
Subject [jira] [Resolved] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
Date Fri, 16 Oct 2015 23:02:05 GMT


Davies Liu resolved SPARK-10877.
          Resolution: Fixed
    Target Version/s: 1.5.2, 1.6.0

> Assertions fail straightforward DataFrame job due to word alignment
> -------------------------------------------------------------------
>                 Key: SPARK-10877
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Matt Cheah
>            Assignee: Davies Liu
>         Attachments: SparkFilterByKeyTest.scala
> I have some code that I’m running in a unit test suite, but the code I’m running
is failing with an assertion error.
> I have translated the JUnit test that was failing, to a Scala script that I will attach
to the ticket. The assertion error is the following:
> {code}
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0
(TID 0, localhost): java.lang.AssertionError: lengthInBytes must be a multiple of 8 (word-aligned)
> at org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(
> at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(
> at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149)
> at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247)
> at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85)
> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
> at scala.collection.Iterator$$anon$
> {code}
> However, it turns out that this code actually works normally and computes the correct
result if assertions are turned off.
> I traced the code and found that when hashUnsafeWords was called, it was given a byte-length
of 12, which clearly is not a multiple of 8. However, the job seems to compute correctly regardless
of this fact. Of course, I can’t just disable assertions for my unit test though.
> A few things we need to understand:
> 1. Why is the lengthInBytes of size 12?
> 2. Is it actually a problem that the byte length is not word-aligned? If so, how should
we fix the byte length? If it's not a problem, why is the assertion flagging a false negative?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message