flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Apache Flink transactions
Date Fri, 05 Jun 2015 06:18:05 GMT
Yes, this code seems very reasonable. :D

The way to use this to "modify" a file on HDFS is to read the file,
then filter out some elements and write a new modified file that does
not contain the filtered out elements. As said before, Flink (or
HDFS), does not allow in-place modification of files.

On Fri, Jun 5, 2015 at 4:55 AM, Chiwan Park <chiwanpark@icloud.com> wrote:
> Basically Flink uses Data Model in functional programming model. All DataSet is immutable.
This means we cannot modify DataSet but ㅐonly can create new DataSet with modification.
Update, delete query are represented as writing filtered DataSet.
> Following scala sample shows select, insert, update, and remove query in Flink. (I’m
not sure this is best practice.)
>
> case class MyType(id: Int, value1: String, value2: String)
>
> // load data (you can use readCsvFile, or something else.)
> val data = env.fromElements(MyType(0, “test”, “test2”), MyType(1, “hello”,
“flink”), MyType(2, “flink”, “good”))
>
> // selecting
> // same as SELECT * FROM data WHERE id = 1
> val selectedData1 = data.filter(_.id == 1)
> // same as SELECT value1 FROM data WHERE id = 1
> val selectedData2 = data.filter(_.id == 1).map(_.value1)
>
> // removing is same as selecting such as following
> // same as DELETE FROM data WHERE id = 1, but DataSet data is not changed. the result
is removedData
> val removedData = data.filter(_.id != 1)
>
> // inserting
> // same as INSERT INTO data (id, value1, value2) VALUES (3, “new”, “data”)
> val newData = env.fromElements(MyType(3, “new”, “data”))
> val insertedData = data.union(newData)
>
> // updating
> // UPDATE data SET value1 = “updated”, value2 = “data” WHERE id = 1, but DataSet
data is not changed.
> val updatedData = data.map { x => if (x.id == 1) MyType(x.id, “updated”, “data”)
else x }
>
> Regards,
> Chiwan Park
>
>> On Jun 5, 2015, at 9:22 AM, hawin <hawin.jiang@gmail.com> wrote:
>>
>> Hi  Chiwan
>>
>> Thanks for your information.  I knew Flink is not DBMS. I want to know what
>> is the flink way to select, insert, update and delete data on HDFS.
>>
>>
>> @Till
>> Maybe union is a way to insert data. But I think it will cost some
>> performance issue.
>>
>>
>> @Stephan
>> Thanks for your suggestion.  I have checked apache flink roadmap.  SQL on
>> flink will be released on Q3/Q4 2015. Will it support insertion, deletion
>> and update data on HDFS?
>> You guys already provided a nice example for selecting data on HDFS.  Such
>> as: TPCHQuery10 and TPCHQuery3.
>> Do you have other examples for inserting, updating and removing data on HDFS
>> by Apache flink
>>
>> Thanks
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Apache-Flink-transactions-tp1457p1491.html
>> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
>
>
>
>

Mime
View raw message