flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangminglei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-9407) Support orc rolling sink writer
Date Sun, 08 Jul 2018 11:06:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhangminglei updated FLINK-9407:
--------------------------------
    Description: 
Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}.
I would suggest add an orc writer for rolling sink.

Below, FYI.

I tested the PR and verify the results with spark sql. Obviously, we can get the results of
what we written down before. But I will give more tests in the next couple of days. Including
the performance under compression. And more UTs.

{code:java}
scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21")
res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]

scala>

scala> res1.registerTempTable("tablerice")
warning: there was one deprecation warning; re-run with -deprecation for details

scala> spark.sql("select * from tablerice")
res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]

scala> res3.show(3)
+-----+---+-------+
| name|age|married|
+-----+---+-------+
|Sagar| 26|  false|
|Sagar| 30|  false|
|Sagar| 34|  false|
+-----+---+-------+
only showing top 3 rows
{code}


  was:
Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}.
I would suggest add an orc writer for rolling sink.

Below, FYI.

I tested the PR and verify the results with spark sql. Obviously, we can get the results of
what we written down before. But I will give more tests in the next couple of days. Including
the performance under compression. And more UT tests.

{code:java}
scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21")
res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]

scala>

scala> res1.registerTempTable("tablerice")
warning: there was one deprecation warning; re-run with -deprecation for details

scala> spark.sql("select * from tablerice")
res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]

scala> res3.show(3)
+-----+---+-------+
| name|age|married|
+-----+---+-------+
|Sagar| 26|  false|
|Sagar| 30|  false|
|Sagar| 34|  false|
+-----+---+-------+
only showing top 3 rows
{code}



> Support orc rolling sink writer
> -------------------------------
>
>                 Key: FLINK-9407
>                 URL: https://issues.apache.org/jira/browse/FLINK-9407
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: zhangminglei
>            Assignee: zhangminglei
>            Priority: Major
>              Labels: patch-available, pull-request-available
>
> Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}.
I would suggest add an orc writer for rolling sink.
> Below, FYI.
> I tested the PR and verify the results with spark sql. Obviously, we can get the results
of what we written down before. But I will give more tests in the next couple of days. Including
the performance under compression. And more UTs.
> {code:java}
> scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21")
> res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
> scala>
> scala> res1.registerTempTable("tablerice")
> warning: there was one deprecation warning; re-run with -deprecation for details
> scala> spark.sql("select * from tablerice")
> res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
> scala> res3.show(3)
> +-----+---+-------+
> | name|age|married|
> +-----+---+-------+
> |Sagar| 26|  false|
> |Sagar| 30|  false|
> |Sagar| 34|  false|
> +-----+---+-------+
> only showing top 3 rows
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message