spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HyukjinKwon <...@git.apache.org>
Subject [GitHub] spark issue #19816: [SPARK-21693][FOLLOWUP][R] Reduce shuffle partitions run...
Date Sat, 25 Nov 2017 09:11:20 GMT
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19816
  
    @felixcheung, I just tried to lower this by default and ran. Seems some tests are being
failed. For example, if we lower`spark.sql.shuffle.partitions` to 5, these fail additionally:
    
    ```
    Failed -------------------------------------------------------------------------
    1. Failure: spark.als (@test_mllib_recommendation.R#36) ------------------------
    predictions$prediction not equal to c(-0.1380762, 2.6258414, -1.5018409).
    3/3 mismatches (average diff: 2.75)
    [1]  2.626 - -0.138 ==  2.76
    [2] -1.502 -  2.626 == -4.13
    [3] -0.138 - -1.502 ==  1.36
    
    
    2. Failure: pivot GroupedData column (@test_sparkSQL.R#1921) -------------------
    `sum1` not equal to `correct_answer`.
    Component “year”: Mean relative difference: 0.0004961548
    Component “Python”: Mean relative difference: 0.0952381
    Component “R”: Mean relative difference: 0.5454545
    
    
    3. Failure: pivot GroupedData column (@test_sparkSQL.R#1922) -------------------
    `sum2` not equal to `correct_answer`.
    Component “year”: Mean relative difference: 0.0004961548
    Component “Python”: Mean relative difference: 0.0952381
    Component “R”: Mean relative difference: 0.5454545
    
    
    4. Failure: pivot GroupedData column (@test_sparkSQL.R#1923) -------------------
    `sum3` not equal to `correct_answer`.
    Component “year”: Mean relative difference: 0.0004961548
    Component “Python”: Mean relative difference: 0.0952381
    Component “R”: Mean relative difference: 0.5454545
    
    
    5. Failure: pivot GroupedData column (@test_sparkSQL.R#1924) -------------------
    `sum4` not equal to correct_answer[, c("year", "R")].
    Component “year”: Mean relative difference: 0.0004961548
    Component “R”: Mean relative difference: 0.5454545
    ```
     
    Shuffle + R worker cases look not quite frequent (to be clear, just shuffle without R
will be fine IIUC). 
    
    I don't have a strong opinion on lowering because ..  if we don't lower, some tests in
the future could cause such problem again vs if we should lower, the required change looks
quite larger and this case might be not quite frequent. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message