Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
Received-SPF: pass (nike.apache.org: domain of balassi.marton@gmail.com
 designates 74.125.82.53 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKgUhLJ1KA5xvnPOfYSd2Li4OyTK33-+3DvczX1kG7v76CXpgQ@mail.gmail.com>
References: <5577f3da.c1e3420a.30c3.506e@mx.google.com>
 <CAKADb_NrwGLWvCvx4qxC1kOPGGhp1k7SVoi6jajGuEkXEVUpQA@mail.gmail.com>
 <CAKgUhLKoZsnxzb_0WxV0OeP9zHPZZr5ghf+jkTRRcJKWS+9k0A@mail.gmail.com>
 <CAKgUhL+VY8FYNjTWcoBeTcqwxxV1pC8Uc6oMB=9sdZkF1BRj6Q@mail.gmail.com>
 <CAKADb_O4h_PFwrDMN-Qyjb0ga+VhGM18gBYnrQmrC3S=Lxat-w@mail.gmail.com>
 <CANC1h_syYyZdQm-1ehS6B2UqrP_uAKTs+rKX1Vvot7KoK5VPOg@mail.gmail.com>
 <CAKgUhLJ1KA5xvnPOfYSd2Li4OyTK33-+3DvczX1kG7v76CXpgQ@mail.gmail.com>
From: =?UTF-8?Q?M=C3=A1rton_Balassi?= <balassi.marton@gmail.com>
Date: Mon, 29 Jun 2015 06:08:37 +0200
Message-ID: 
 <CAKADb_OiqqJfd5mbi9bAgwoTA4M9P=BWGE_LMUUZB=UyGFAjrg@mail.gmail.com>
Subject: Re: Best way to write data to HDFS by Flink
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=047d7b86cf3ea572050519a03f4d

--047d7b86cf3ea572050519a03f4d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Dear Hawin,

As for your issues with running the Flink Kafka examples: are those
resolved with Aljoscha's comment in the other thread? :)

Best,

Marton

On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <hawin.jiang@gmail.com> wrote:

> Hi Stephan
>
> Yes, that is a great idea.  if it is possible,  I will try my best to
> contribute some codes to Flink.
> But I have to run some flink examples first to understand Apache Flink.
> I just run some kafka with flink examples.  No examples working for me.
> I am so sad right now.
> I didn't get any troubles to run kafka examples from *kafka*.apache.org
> so far.
> Please suggest me.
> Thanks.
>
>
>
> Best regards
> Hawin
>
>
> On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi Hawin!
>>
>> If you are creating code for such an output into different
>> files/partitions, it would be amazing if you could contribute this code =
to
>> Flink.
>>
>> It seems like a very common use case, so this functionality will be
>> useful to other user as well!
>>
>> Greetings,
>> Stephan
>>
>>
>> On Tue, Jun 23, 2015 at 3:36 PM, M=C3=A1rton Balassi <balassi.marton@gma=
il.com
>> > wrote:
>>
>>> Dear Hawin,
>>>
>>> We do not have out of the box support for that, it is something you
>>> would need to implement yourself in a custom SinkFunction.
>>>
>>> Best,
>>>
>>> Marton
>>>
>>> On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <hawin.jiang@gmail.com>
>>> wrote:
>>>
>>>> Hi  Marton
>>>>
>>>> if we received a huge data from kafka and wrote to HDFS immediately.
>>>> We should use buffer timeout based on your URL
>>>> I am not sure you have flume experience.  Flume can be configured
>>>> buffer size and partition as well.
>>>>
>>>> What is the partition.
>>>> For example:
>>>> I want to write 1 minute buffer file to HDFS which is
>>>> /data/flink/year=3D2015/month=3D06/day=3D22/hour=3D21.
>>>> if the partition(/data/flink/year=3D2015/month=3D06/day=3D22/hour=3D21=
) is
>>>> there, no need to create it. Otherwise, flume will create it automatic=
ally.
>>>> Flume knows the coming data will come to right partition.
>>>>
>>>> I am not sure Flink also provided a similar partition API or
>>>> configuration for this.
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> Best regards
>>>> Hawin
>>>>
>>>> On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <hawin.jiang@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Marton
>>>>> I will use this code to implement my testing.
>>>>>
>>>>>
>>>>>
>>>>> Best regards
>>>>> Hawin
>>>>>
>>>>> On Wed, Jun 10, 2015 at 1:30 AM, M=C3=A1rton Balassi <
>>>>> balassi.marton@gmail.com> wrote:
>>>>>
>>>>>> Dear Hawin,
>>>>>>
>>>>>> You can pass a hdfs path to DataStream's and DataSet's writeAsText
>>>>>> and writeAsCsv methods.
>>>>>> I assume that you are running a Streaming topology, because your
>>>>>> source is Kafka, so it would look like the following:
>>>>>>
>>>>>> StreamExecutionEnvironment env =3D
>>>>>> StreamExecutionEnvironment.getExecutionEnvironment();
>>>>>>
>>>>>> env.addSource(PerisitentKafkaSource(..))
>>>>>>       .map(/* do you operations*/)
>>>>>>
>>>>>> .wirteAsText("hdfs://<namenode_name>:<namenode_port>/path/to/your/fi=
le");
>>>>>>
>>>>>> Check out the relevant section of the streaming docs for more info.
>>>>>> [1]
>>>>>>
>>>>>> [1]
>>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming=
_guide.html#connecting-to-the-outside-world
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Marton
>>>>>>
>>>>>> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <hawin.jiang@gmail.com=
>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can someone tell me what is the best way to write data to HDFS when
>>>>>>> Flink received data from Kafka?
>>>>>>>
>>>>>>> Big thanks for your example.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> Hawin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--047d7b86cf3ea572050519a03f4d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Dear Hawin,<div><br></div><div>As for your issues with run=
ning the Flink Kafka examples: are those resolved with Aljoscha&#39;s comme=
nt in the other thread? :)</div><div><br></div><div>Best,</div><div><br></d=
iv><div>Marton</div></div><div class=3D"gmail_extra"><br><div class=3D"gmai=
l_quote">On Fri, Jun 26, 2015 at 8:40 AM, Hawin Jiang <span dir=3D"ltr">&lt=
;<a href=3D"mailto:hawin.jiang@gmail.com" target=3D"_blank">hawin.jiang@gma=
il.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"=
ltr">Hi Stephan<div><br></div><div>Yes, that is a great idea. =C2=A0if it i=
s possible, =C2=A0I will try my best to contribute some codes to Flink.=C2=
=A0</div><div>But I have to run some flink examples first to understand Apa=
che Flink.</div><div>I just run some kafka with flink examples.=C2=A0 No ex=
amples working for me. =C2=A0 I am so sad right now.</div><div>I didn&#39;t=
 get any troubles to run kafka examples from=C2=A0<b style=3D"color:rgb(0,1=
02,33);font-size:14px;line-height:16px;white-space:nowrap">kafka</b><span s=
tyle=3D"color:rgb(0,102,33);font-size:14px;line-height:16px;white-space:now=
rap">.<a href=3D"http://apache.org" target=3D"_blank">apache.org</a> so far=
.=C2=A0</span></div><div>Please suggest me.</div><div>Thanks.</div><div><br=
></div><div><br></div><div><br></div><div>Best regards</div><span class=3D"=
HOEnZb"><font color=3D"#888888"><div>Hawin</div><div><br></div></font></spa=
n></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra">=
<br><div class=3D"gmail_quote">On Wed, Jun 24, 2015 at 1:02 AM, Stephan Ewe=
n <span dir=3D"ltr">&lt;<a href=3D"mailto:sewen@apache.org" target=3D"_blan=
k">sewen@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
><div dir=3D"ltr">Hi Hawin!<div><br></div><div>If you are creating code for=
 such an output into different files/partitions, it would be amazing if you=
 could contribute this code to Flink.</div><div><br></div><div>It seems lik=
e a very common use case, so this functionality will be useful to other use=
r as well!</div><div><br></div><div>Greetings,<br>Stephan</div><div><br></d=
iv></div><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote=
">On Tue, Jun 23, 2015 at 3:36 PM, M=C3=A1rton Balassi <span dir=3D"ltr">&l=
t;<a href=3D"mailto:balassi.marton@gmail.com" target=3D"_blank">balassi.mar=
ton@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
dir=3D"ltr">Dear Hawin,<div><br></div><div>We do not have out of the box su=
pport for that, it is something you would need to implement yourself in a c=
ustom SinkFunction.</div><div><br></div><div>Best,</div><div><br></div><div=
>Marton</div></div><div><div><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">On Mon, Jun 22, 2015 at 11:51 PM, Hawin Jiang <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:hawin.jiang@gmail.com" target=3D"_blank">hawin.jiang=
@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr">Hi =C2=A0Marton<div><br></div><div>if we received a huge data from=
 kafka and wrote to HDFS immediately.=C2=A0 We should use buffer timeout ba=
sed on your URL</div><div>I am not sure you have flume experience.=C2=A0 Fl=
ume can be configured buffer size and partition as well.</div><div><br></di=
v><div>What is the partition. =C2=A0</div><div>For example:</div><div>I wan=
t to write 1 minute buffer file to HDFS which is /data/flink/year=3D2015/mo=
nth=3D06/day=3D22/hour=3D21.=C2=A0</div><div>if the partition(/data/flink/y=
ear=3D2015/month=3D06/day=3D22/hour=3D21)=C2=A0is there, no need to create =
it. Otherwise, flume will create it automatically.=C2=A0</div><div>Flume kn=
ows the coming data will come to right partition. =C2=A0</div><div><br></di=
v><div>I am not sure Flink also provided a similar partition API or configu=
ration for this.=C2=A0<br></div><div>Thanks.</div><div><br></div><div><br><=
/div><div><br></div><div>Best regards</div><span><font color=3D"#888888"><d=
iv>Hawin</div></font></span></div><div><div><div class=3D"gmail_extra"><br>=
<div class=3D"gmail_quote">On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:hawin.jiang@gmail.com" target=3D"_bla=
nk">hawin.jiang@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex"><div dir=3D"ltr">Thanks Marton<div>I will use this code to implement=
 my testing.</div><div><br></div><div><br></div><div><br></div><div>Best re=
gards</div><span><font color=3D"#888888"><div>Hawin</div></font></span></di=
v><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On We=
d, Jun 10, 2015 at 1:30 AM, M=C3=A1rton Balassi <span dir=3D"ltr">&lt;<a hr=
ef=3D"mailto:balassi.marton@gmail.com" target=3D"_blank">balassi.marton@gma=
il.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"=
ltr">Dear Hawin,<div><br></div><div>You can pass a hdfs path to DataStream&=
#39;s and DataSet&#39;s writeAsText and writeAsCsv methods.</div><div>I ass=
ume that you are running a Streaming topology, because your source is Kafka=
, so it would look like the following:</div><div><br></div>StreamExecutionE=
nvironment env =3D StreamExecutionEnvironment.getExecutionEnvironment();<di=
v><br></div><div>env.addSource(PerisitentKafkaSource(..))</div><div>=C2=A0 =
=C2=A0 =C2=A0 .map(/* do you operations*/)</div><div>=C2=A0 =C2=A0 =C2=A0 .=
wirteAsText(&quot;hdfs://&lt;namenode_name&gt;:&lt;namenode_port&gt;/path/t=
o/your/file&quot;);</div><div><br>Check out the relevant section of the str=
eaming docs for more info. [1]</div><div><br></div><div>[1] <a href=3D"http=
://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html=
#connecting-to-the-outside-world" target=3D"_blank">http://ci.apache.org/pr=
ojects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-=
outside-world</a><br></div><div><br></div><div>Best,</div><div><br></div><d=
iv>Marton</div></div><div><div><div class=3D"gmail_extra"><br><div class=3D=
"gmail_quote">On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang <span dir=3D"lt=
r">&lt;<a href=3D"mailto:hawin.jiang@gmail.com" target=3D"_blank">hawin.jia=
ng@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div l=
ang=3D"ZH-CN" link=3D"blue" vlink=3D"purple"><div><p class=3D"MsoNormal"><s=
pan lang=3D"EN-US">Hi All<u></u><u></u></span></p><p class=3D"MsoNormal"><s=
pan lang=3D"EN-US"><u></u>=C2=A0<u></u></span></p><p class=3D"MsoNormal"><s=
pan lang=3D"EN-US">Can someone tell me what is the best way to write data t=
o HDFS when Flink received data from Kafka?<u></u><u></u></span></p><p clas=
s=3D"MsoNormal"><span lang=3D"EN-US">Big thanks for your example.<u></u><u>=
</u></span></p><p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=A0<u>=
</u></span></p><p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=A0<u>=
</u></span></p><p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=A0<u>=
</u></span></p><p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=A0<u>=
</u></span></p><p class=3D"MsoNormal"><span lang=3D"EN-US">Best regards<spa=
n><font color=3D"#888888"><u></u><u></u></font></span></span></p><span><fon=
t color=3D"#888888"><p class=3D"MsoNormal"><span lang=3D"EN-US">Hawin<u></u=
><u></u></span></p><p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=
=A0<u></u></span></p></font></span></div></div></blockquote></div><br></div=
>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b86cf3ea572050519a03f4d--