spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongj...@apache.org>
Subject Fwd: Question about SPARK-11374 (skip.header.line.count)
Date Fri, 09 Dec 2016 00:01:11 GMT
+dev

I forget to add @user.

Dongjoon.

---------- Forwarded message ---------
From: Dongjoon Hyun <dongjoon@apache.org>
Date: Thu, Dec 8, 2016 at 16:00
Subject: Question about SPARK-11374 (skip.header.line.count)
To: <dev@spark.apache.org>


Hi, All.



Could you give me some opinion?



There is an old SPARK issue, SPARK-11374, about removing header lines from
text file.

Currently, Spark supports removing CSV header lines by the following way.



```

scala> spark.read.option("header","true").csv("/data").show

+---+---+

| c1| c2|

+---+---+

|  1|  a|

|  2|  b|

+---+---+

```



In SQL world, we can support that like the Hive way,
`skip.header.line.count`.



```

scala> sql("CREATE TABLE t1 (id INT, value VARCHAR(10)) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data'
TBLPROPERTIES('skip.header.line.count'='1')")

scala> sql("SELECT * FROM t1").show

+---+-----+

| id|value|

+---+-----+

|  1|    a|

|  2|    b|

+---+-----+

```



Although I made a PR for this based on the JIRA issue, I want to know this
is really needed feature.

Is it need for your use cases? Or, it's enough for you to remove them in a
preprocessing stage.

If this is too old and not proper in these days, I'll close the PR and JIRA
issue as WON'T FIX.



Thank you for all in advance!



Bests,

Dongjoon.



---------------------------------------------------------------------

To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Mime
View raw message