spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dapeng Sun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21661) SparkSQL can't merge load table from Hadoop
Date Tue, 08 Aug 2017 04:22:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dapeng Sun updated SPARK-21661:
-------------------------------
    Description: 
Here is the original text of external table on HDFS:
{noformat}
Permission	Owner	Group	Size	Last Modified	Replication	Block Size	Name
-rw-r--r--	root	supergroup	0 B	8/6/2017, 11:43:03 PM	3	256 MB	income_band_001.dat
-rw-r--r--	root	supergroup	0 B	8/6/2017, 11:39:31 PM	3	256 MB	income_band_002.dat
...
-rw-r--r--	root	supergroup	327 B	8/6/2017, 11:44:47 PM	3	256 MB	income_band_530.dat
{noformat}
After SparkSQL load, every files have a output file, even the files are 0B. For the load on
Hive, the data files would be merged according the data size of original files.

CREATE EXTERNAL TABLE t1 (a int,b string) 

  was:
Here is the original text of external table on HDFS:
{noformat}
Permission	Owner	Group	Size	Last Modified	Replication	Block Size	Name
-rw-r--r--	root	supergroup	0 B	8/6/2017, 11:43:03 PM	3	256 MB	income_band_001.dat
-rw-r--r--	root	supergroup	0 B	8/6/2017, 11:39:31 PM	3	256 MB	income_band_002.dat
...
-rw-r--r--	root	supergroup	327 B	8/6/2017, 11:44:47 PM	3	256 MB	income_band_530.dat
{noformat}
After SparkSQL load, every files have a output file, even the files are 0B. For the load on
Hive, the data files would be merged according the data size of original files.



> SparkSQL can't merge load table from Hadoop
> -------------------------------------------
>
>                 Key: SPARK-21661
>                 URL: https://issues.apache.org/jira/browse/SPARK-21661
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Dapeng Sun
>
> Here is the original text of external table on HDFS:
> {noformat}
> Permission	Owner	Group	Size	Last Modified	Replication	Block Size	Name
> -rw-r--r--	root	supergroup	0 B	8/6/2017, 11:43:03 PM	3	256 MB	income_band_001.dat
> -rw-r--r--	root	supergroup	0 B	8/6/2017, 11:39:31 PM	3	256 MB	income_band_002.dat
> ...
> -rw-r--r--	root	supergroup	327 B	8/6/2017, 11:44:47 PM	3	256 MB	income_band_530.dat
> {noformat}
> After SparkSQL load, every files have a output file, even the files are 0B. For the load
on Hive, the data files would be merged according the data size of original files.
> CREATE EXTERNAL TABLE t1 (a int,b string) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message