Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECD9B18EAB for ; Mon, 29 Jun 2015 04:09:24 +0000 (UTC) Received: (qmail 33888 invoked by uid 500); 29 Jun 2015 04:09:24 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 33809 invoked by uid 500); 29 Jun 2015 04:09:24 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 33799 invoked by uid 99); 29 Jun 2015 04:09:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jun 2015 04:09:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of balassi.marton@gmail.com designates 74.125.82.53 as permitted sender) Received: from [74.125.82.53] (HELO mail-wg0-f53.google.com) (74.125.82.53) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jun 2015 04:07:08 +0000 Received: by wgjx7 with SMTP id x7so56968319wgj.2 for ; Sun, 28 Jun 2015 21:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=IGeNDu9d7qCvSa4aMv/e9fcjuNioili0ZRtjW8MEncE=; b=rt0oDWMdCgDreseiIxtdKXAH72FSPT7LZufb6kJp3NY14nKShYav3DH0sQcsVsMn5D HvlPKSsuXNRW3F0L1iORgHh1F6JliJSQnz5JEj6Mq5FHQz62GD55mWCq0d9Vhq11FS+a eds9OQCCLKubxmaemu+/U+4ONM78k3LGqU2fNV4JFdVYhs8ecFjzljWOd1hPHKN7k+QB 1/zTyj94fe5GRL1FG/RKdUpYiXdgswW3riQMeSRqYk3LUmHQLYaNIdOjOwC8cOamipvV M5TRcjdmnOZRU9/Lyd3MDvDjStSSm9j3Qw89+OnOJNv0I66Ri4yKxlYbFW4yhFXcFXyf VXLA== X-Received: by 10.194.58.7 with SMTP id m7mr24298208wjq.109.1435550936820; Sun, 28 Jun 2015 21:08:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.86.165 with HTTP; Sun, 28 Jun 2015 21:08:37 -0700 (PDT) In-Reply-To: References: <5577f3da.c1e3420a.30c3.506e@mx.google.com> From: =?UTF-8?Q?M=C3=A1rton_Balassi?= Date: Mon, 29 Jun 2015 06:08:37 +0200 Message-ID: Subject: Re: Best way to write data to HDFS by Flink To: user@flink.apache.org Content-Type: multipart/alternative; boundary=047d7b86cf3ea572050519a03f4d X-Virus-Checked: Checked by ClamAV on apache.org --047d7b86cf3ea572050519a03f4d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dear Hawin, As for your issues with running the Flink Kafka examples: are those resolved with Aljoscha's comment in the other thread? :) Best, Marton On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang wrote: > Hi Stephan > > Yes, that is a great idea. if it is possible, I will try my best to > contribute some codes to Flink. > But I have to run some flink examples first to understand Apache Flink. > I just run some kafka with flink examples. No examples working for me. > I am so sad right now. > I didn't get any troubles to run kafka examples from *kafka*.apache.org > so far. > Please suggest me. > Thanks. > > > > Best regards > Hawin > > > On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen wrote: > >> Hi Hawin! >> >> If you are creating code for such an output into different >> files/partitions, it would be amazing if you could contribute this code = to >> Flink. >> >> It seems like a very common use case, so this functionality will be >> useful to other user as well! >> >> Greetings, >> Stephan >> >> >> On Tue, Jun 23, 2015 at 3:36 PM, M=C3=A1rton Balassi > > wrote: >> >>> Dear Hawin, >>> >>> We do not have out of the box support for that, it is something you >>> would need to implement yourself in a custom SinkFunction. >>> >>> Best, >>> >>> Marton >>> >>> On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang >>> wrote: >>> >>>> Hi Marton >>>> >>>> if we received a huge data from kafka and wrote to HDFS immediately. >>>> We should use buffer timeout based on your URL >>>> I am not sure you have flume experience. Flume can be configured >>>> buffer size and partition as well. >>>> >>>> What is the partition. >>>> For example: >>>> I want to write 1 minute buffer file to HDFS which is >>>> /data/flink/year=3D2015/month=3D06/day=3D22/hour=3D21. >>>> if the partition(/data/flink/year=3D2015/month=3D06/day=3D22/hour=3D21= ) is >>>> there, no need to create it. Otherwise, flume will create it automatic= ally. >>>> Flume knows the coming data will come to right partition. >>>> >>>> I am not sure Flink also provided a similar partition API or >>>> configuration for this. >>>> Thanks. >>>> >>>> >>>> >>>> Best regards >>>> Hawin >>>> >>>> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang >>>> wrote: >>>> >>>>> Thanks Marton >>>>> I will use this code to implement my testing. >>>>> >>>>> >>>>> >>>>> Best regards >>>>> Hawin >>>>> >>>>> On Wed, Jun 10, 2015 at 1:30 AM, M=C3=A1rton Balassi < >>>>> balassi.marton@gmail.com> wrote: >>>>> >>>>>> Dear Hawin, >>>>>> >>>>>> You can pass a hdfs path to DataStream's and DataSet's writeAsText >>>>>> and writeAsCsv methods. >>>>>> I assume that you are running a Streaming topology, because your >>>>>> source is Kafka, so it would look like the following: >>>>>> >>>>>> StreamExecutionEnvironment env =3D >>>>>> StreamExecutionEnvironment.getExecutionEnvironment(); >>>>>> >>>>>> env.addSource(PerisitentKafkaSource(..)) >>>>>> .map(/* do you operations*/) >>>>>> >>>>>> .wirteAsText("hdfs://:/path/to/your/fi= le"); >>>>>> >>>>>> Check out the relevant section of the streaming docs for more info. >>>>>> [1] >>>>>> >>>>>> [1] >>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming= _guide.html#connecting-to-the-outside-world >>>>>> >>>>>> Best, >>>>>> >>>>>> Marton >>>>>> >>>>>> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang >>>>>> wrote: >>>>>> >>>>>>> Hi All >>>>>>> >>>>>>> >>>>>>> >>>>>>> Can someone tell me what is the best way to write data to HDFS when >>>>>>> Flink received data from Kafka? >>>>>>> >>>>>>> Big thanks for your example. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Hawin >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --047d7b86cf3ea572050519a03f4d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Dear Hawin,

As for your issues with run= ning the Flink Kafka examples: are those resolved with Aljoscha's comme= nt in the other thread? :)

Best,

Marton

On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <= ;hawin.jiang@gma= il.com> wrote:
Hi Stephan

Yes, that is a great idea. =C2=A0if it i= s possible, =C2=A0I will try my best to contribute some codes to Flink.=C2= =A0
But I have to run some flink examples first to understand Apa= che Flink.
I just run some kafka with flink examples.=C2=A0 No ex= amples working for me. =C2=A0 I am so sad right now.
I didn't= get any troubles to run kafka examples from=C2=A0kafka.apache.org so far= .=C2=A0
Please suggest me.
Thanks.


Best regards
Hawin

=
On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewe= n <sewen@apache.org> wrote:
Hi Hawin!

If you are creating code for= such an output into different files/partitions, it would be amazing if you= could contribute this code to Flink.

It seems lik= e a very common use case, so this functionality will be useful to other use= r as well!

Greetings,
Stephan


On Tue, Jun 23, 2015 at 3:36 PM, M=C3=A1rton Balassi &l= t;balassi.mar= ton@gmail.com> wrote:
Dear Hawin,

We do not have out of the box su= pport for that, it is something you would need to implement yourself in a c= ustom SinkFunction.

Best,

Marton

On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.jiang= @gmail.com> wrote:
Hi =C2=A0Marton

if we received a huge data from= kafka and wrote to HDFS immediately.=C2=A0 We should use buffer timeout ba= sed on your URL
I am not sure you have flume experience.=C2=A0 Fl= ume can be configured buffer size and partition as well.

What is the partition. =C2=A0
For example:
I wan= t to write 1 minute buffer file to HDFS which is /data/flink/year=3D2015/mo= nth=3D06/day=3D22/hour=3D21.=C2=A0
if the partition(/data/flink/y= ear=3D2015/month=3D06/day=3D22/hour=3D21)=C2=A0is there, no need to create = it. Otherwise, flume will create it automatically.=C2=A0
Flume kn= ows the coming data will come to right partition. =C2=A0

I am not sure Flink also provided a similar partition API or configu= ration for this.=C2=A0
Thanks.


<= /div>

Best regards
Hawin

=
On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.jiang@gmail.com> wrote:
Thanks Marton
I will use this code to implement= my testing.



Best re= gards
Hawin

On We= d, Jun 10, 2015 at 1:30 AM, M=C3=A1rton Balassi <balassi.marton@gma= il.com> wrote:
Dear Hawin,

You can pass a hdfs path to DataStream&= #39;s and DataSet's writeAsText and writeAsCsv methods.
I ass= ume that you are running a Streaming topology, because your source is Kafka= , so it would look like the following:

StreamExecutionE= nvironment env =3D StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource(PerisitentKafkaSource(..))
=C2=A0 = =C2=A0 =C2=A0 .map(/* do you operations*/)
=C2=A0 =C2=A0 =C2=A0 .= wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/t= o/your/file");

Check out the relevant section of the str= eaming docs for more info. [1]


Best,

Marton

On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.jia= ng@gmail.com> wrote:

Hi All

=C2=A0

Can someone tell me what is the best way to write data t= o HDFS when Flink received data from Kafka?

Big thanks for your example.=

=C2=A0=

=C2=A0=

=C2=A0=

=C2=A0=

Best regards

Hawin

=C2= =A0








--047d7b86cf3ea572050519a03f4d--