flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly
Date Fri, 17 Oct 2014 11:02:24 GMT
I agree, we should cancel the release, fix this, and make a new release
candidate.

Stephan


On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <fhueske@apache.org> wrote:

> Yes, that was intentionally.
>
> The whole point of using a parallel engine is to process large datasets.
> Otherwise you could do it in Python on a single box...
> Remote reads will severely impact the performance and might cause
> significant performance regression.
>
> 2014-10-17 12:04 GMT+02:00 Robert Metzger <rmetzger@apache.org>:
>
> > Did you intentionally post to the mailing list?
> >
> > I'm investigating the issue.
> > So far, I found that the hostname has never been passed to the input
> split
> > assigner. I guess this issue was introduced by the recent jobmanager
> > changes.
> > And secondly, Flink is using the fully qualified hostname, whereas HDFS
> is
> > using the hostname only. This caused a string-mismatch.
> >
> > I wouln't cancel the release because we are at a point where it is faster
> > to vote a bugfix release.
> > The issue is not a show stopper for using flink. Its just slow on large
> > datasets.
> >
> > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <fhueske@apache.org>
> > wrote:
> >
> > > This is a critical issue and sounds bit like a release blocker for 0.7
> to
> > > me.
> > >
> > > Other opinions?
> > >
> > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <jira@apache.org>:
> > >
> > > > Robert Metzger created FLINK-1170:
> > > > -------------------------------------
> > > >
> > > >              Summary: Localization of InputSplits is not working
> > properly
> > > >                  Key: FLINK-1170
> > > >                  URL:
> https://issues.apache.org/jira/browse/FLINK-1170
> > > >              Project: Flink
> > > >           Issue Type: Bug
> > > >           Components: Distributed Runtime
> > > >             Reporter: Robert Metzger
> > > >             Assignee: Robert Metzger
> > > >
> > > >
> > > > While running some benchmarks, I found that Flink is not properly
> > > > assigning the InputSplits.
> > > >
> > > > On my testing cluster, ALL splits were assigned to remote HDFS
> > DataNodes,
> > > > which causes a lot of network I/O.
> > > >
> > > >
> > > >
> > > > --
> > > > This message was sent by Atlassian JIRA
> > > > (v6.3.4#6332)
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message