spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahana HA (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-20794) Spark show() command on dataset does not retrieve consistent rows from DASHDB data source
Date Thu, 18 May 2017 07:20:04 GMT
Sahana HA created SPARK-20794:
---------------------------------

             Summary: Spark show() command on dataset does not retrieve consistent rows from
DASHDB data source
                 Key: SPARK-20794
                 URL: https://issues.apache.org/jira/browse/SPARK-20794
             Project: Spark
          Issue Type: Question
          Components: Spark Core
    Affects Versions: 2.0.0
            Reporter: Sahana HA
            Priority: Minor


When the user creates the dataframe from DASHDB data source (which is a relational database)
and executes df.show(5) it returns different result sets or rows during each execution. We
are aware that show(5) will pick the first 5 rows from existing partition and hence it is
not guaranteed to be consistent across each execution. 

However when we try the same show(5) command against S3 storage or bluemixobject store (non-relational
data source) we always get the same result sets or rows in order, across each execution.

We just wanted to confirm why the difference between DASHDB and other data source like S3/Bluemixobjectstore
? Is the issue with spark or DASHDB alone ? or is the inconsistent rows behavior is there
for all relational data source ?

Repro snippet:

-- Load the data from dashdb
val dashdb = sqlContext.read.format("packageName").options(dashdbreadOptions).load


-- execution #1

dashdb.show(5)

+--------------------+------------+-----------------+-------+-----+-------------+------+---+--------------+------------+
|        PRODUCT_LINE|PRODUCT_TYPE|CUST_ORDER_NUMBER|   CITY|STATE|      COUNTRY|GENDER|AGE|MARITAL_STATUS|
 PROFESSION|
+--------------------+------------+-----------------+-------+-----+-------------+------+---+--------------+------------+
|Personal Accessories|     Eyewear|           107861|Rutland|   VT|United States|     F| 39|
      Married|       Sales|
|   Camping Equipment|    Lanterns|           189003| Sydney|  NSW|    Australia|     F| 20|
       Single| Hospitality|
|   Camping Equipment|Cooking Gear|           107863| Sydney|  NSW|    Australia|     F| 20|
       Single| Hospitality|
|Personal Accessories|     Eyewear|           189005|Villach|   NA|      Austria|     F| 37|
      Married|Professional|
|Personal Accessories|     Eyewear|           107865|Villach|   NA|      Austria|     F| 37|
      Married|Professional|
+--------------------+------------+-----------------+-------+-----+-------------+------+---+--------------+------------+
only showing top 5 rows




-- execution #2


dashdb.show(5)

+--------------------+------------+-----------------+------------+-----+--------------+------+---+--------------+-----------+
|        PRODUCT_LINE|PRODUCT_TYPE|CUST_ORDER_NUMBER|        CITY|STATE|       COUNTRY|GENDER|AGE|MARITAL_STATUS|
PROFESSION|
+--------------------+------------+-----------------+------------+-----+--------------+------+---+--------------+-----------+
|Mountaineering Eq...|       Tools|           112835|  Portsmouth|   NA|United Kingdom|  
  M| 24|        Single|      Other|
|   Camping Equipment|Cooking Gear|           193902|Jacksonville|   FL| United States|  
  F| 22|        Single|Hospitality|
|   Camping Equipment|       Packs|           112837|Jacksonville|   FL| United States|  
  F| 22|        Single|Hospitality|
|Mountaineering Eq...|        Rope|           193904|Jacksonville|   FL| United States|  
  F| 31|       Married|      Other|
|      Golf Equipment|     Putters|           112839|Jacksonville|   FL| United States|  
  F| 31|       Married|      Other|
+--------------------+------------+-----------------+------------+-----+--------------+------+---+--------------+-----------+
only showing top 5 rows





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message