flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krzysztof Pasierbinski <Krzysztof.Pasierbin...@dfki.de>
Subject AW: Cluster execution of an example program ("Word count") and a problem related to the modificated example
Date Sun, 29 Jun 2014 13:06:34 GMT
Hi all,
thank you all for prompt replies. It is great to know, that there is so strong community support.
Yes indeed, I don't use Hadoop yet. I wanted to try out Flink framework and then integrate
it with Hadoop. I have read somewhere that Hadoop is not obligatory.
I wonder, why the same program with the same configuration works fine for small files and
this error appears only for the bigger ones. The example program "Word count" works always
fine, so I suppose that there is my mistake somewhere behind.


-----Urspr√ľngliche Nachricht-----
Von: Aljoscha Krettek [mailto:aljoscha@apache.org] 
Gesendet: Sonntag, 29. Juni 2014 09:24
An: dev@flink.incubator.apache.org
Betreff: Re: Cluster execution of an example program ("Word count") and a problem related
to the modificated example

Hi Krzysztof,
for the file acces problem: From the path it looks like you are accessing them as local files
rather than as files in a distributed file system (HDFS is the default here). So one of the
nodes can access the file because it is actually on the machine where the code is running
while the other code executes on a machine where the file is not available. This explains
how to setup hadoop with HDFS:
http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html . You only need to start HDFS, though,
 with "bin/start-dfs.sh". For accessing files inside HDFS from flink you would use a path
such as "hdfs:///foo/bar"

Please write again if you need more help.

Aljoscha


On Sat, Jun 28, 2014 at 10:57 PM, Ufuk Celebi <u.celebi@fu-berlin.de> wrote:

>
> > On 28 Jun 2014, at 22:52, Stephan Ewen <sewen@apache.org> wrote:
> >
> > Hey!
> >
> > You can always get the result in a single file, by setting the
> parallelism
> > of the sink task to one, for example line 
> > "result.writeAsText(path).parallelism(1)".
>
> Oh sure. I realized this after sending the mail. Thanks for pointing 
> it out. :)
>
Mime
View raw message