flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hawin Jiang" <hawin.ji...@gmail.com>
Subject RE: Best way to write data to HDFS by Flink
Date Mon, 29 Jun 2015 07:56:08 GMT
Dear  Marton


Thanks for your asking.  Yes. it is working now. 

But, the TPS is not very good.   I have met four issues as below


1.       My TPS around 2000 events per second.   But I saw some companies achieved 132K per
second on single node at 2015 Los Angeles big data day yesterday.   For two nodes, the TPS
is 282K per sec.  them used kafka+Spark. 

As you knew  that I used kafka+Flink. Maybe we have to do more investigations from my side.


2.       Regarding my performance testing, I used JMeter to producer data to Kafka.  The total
messages in JMeter side is not matched HDFS side.   In the meantime, I used flink to write
data to HDFS.  


3.       I found that Flink randomly created 1, 2, 3 and 4 folders. Only 1 and 4 folders have
files.  The 2 and 3 folders don’t have any files. 


4.       I am going to develop some codes to write data to /data/flink/year/month/day/hour
folder.  I think that folder structure is good for flink table API in the future. 


Please let me know if you have some comments or suggests for me.





Best regards



From: Márton Balassi [mailto:balassi.marton@gmail.com] 
Sent: Sunday, June 28, 2015 9:09 PM
To: user@flink.apache.org
Subject: Re: Best way to write data to HDFS by Flink


Dear Hawin,


As for your issues with running the Flink Kafka examples: are those resolved with Aljoscha's
comment in the other thread? :)






On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <hawin.jiang@gmail.com> wrote:

Hi Stephan


Yes, that is a great idea.  if it is possible,  I will try my best to contribute some codes
to Flink. 

But I have to run some flink examples first to understand Apache Flink.

I just run some kafka with flink examples.  No examples working for me.   I am so sad right

I didn't get any troubles to run kafka examples from kafka.apache.org so far. 

Please suggest me.





Best regards




On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen <sewen@apache.org> wrote:

Hi Hawin!


If you are creating code for such an output into different files/partitions, it would be amazing
if you could contribute this code to Flink.


It seems like a very common use case, so this functionality will be useful to other user as





On Tue, Jun 23, 2015 at 3:36 PM, Márton Balassi <balassi.marton@gmail.com> wrote:

Dear Hawin,


We do not have out of the box support for that, it is something you would need to implement
yourself in a custom SinkFunction.






On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.jiang@gmail.com> wrote:

Hi  Marton


if we received a huge data from kafka and wrote to HDFS immediately.  We should use buffer
timeout based on your URL

I am not sure you have flume experience.  Flume can be configured buffer size and partition
as well.


What is the partition.  

For example:

I want to write 1 minute buffer file to HDFS which is /data/flink/year=2015/month=06/day=22/hour=21.

if the partition(/data/flink/year=2015/month=06/day=22/hour=21) is there, no need to create
it. Otherwise, flume will create it automatically. 

Flume knows the coming data will come to right partition.  


I am not sure Flink also provided a similar partition API or configuration for this. 





Best regards



On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.jiang@gmail.com> wrote:

Thanks Marton

I will use this code to implement my testing.




Best regards



On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi <balassi.marton@gmail.com> wrote:

Dear Hawin,


You can pass a hdfs path to DataStream's and DataSet's writeAsText and writeAsCsv methods.

I assume that you are running a Streaming topology, because your source is Kafka, so it would
look like the following:


StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();



      .map(/* do you operations*/)


Check out the relevant section of the streaming docs for more info. [1]


[1] http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world






On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.jiang@gmail.com> wrote:

Hi All


Can someone tell me what is the best way to write data to HDFS when Flink received data from

Big thanks for your example.





Best regards










View raw message