flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Flink 1.1.3 OOME Permgen
Date Mon, 05 Dec 2016 11:37:04 GMT
I executed this snipped in each Flink job:

@Override
public void open(Configuration config) {
  ObjectMapper somethingWithJackson = new ObjectMapper();
  try {
    ObjectNode on = somethingWithJackson.readValue("{\"a\": \"b\"}",
ObjectNode.class);
  } catch (IOException e) {
    throw new RuntimeException("You failed", e);
  }
}

But I suspect that I need to map my JSON to a POJO?


On Mon, Dec 5, 2016 at 12:33 PM, Konstantin Knauf <
konstantin.knauf@tngtech.com> wrote:

> Hi Robert,
>
> you need to actually use Jackson. The problematic field is a cache,
> which is filled by all classes, which were serialized/deserialized by
> Jackson.
>
> Best,
>
> Konstantin
>
> On 05.12.2016 11:55, Robert Metzger wrote:
> > I've submitted Wordcount 410 times to a testing cluster and a streaming
> > job 290 times and I could not reproduce the issue with 1.1.3. Also, the
> > heapdump of one of the TaskManagers looked pretty normal.
> >
> > Do you have any ideas how to reproduce the issue?
> >
> > On Fri, Dec 2, 2016 at 3:21 PM, Robert Metzger <rmetzger@apache.org
> > <mailto:rmetzger@apache.org>> wrote:
> >
> >     Thank you for reporting the issue Konstantin.
> >     I've filed a JIRA for the jackson
> >     issue: https://issues.apache.org/jira/browse/FLINK-5233
> >     <https://issues.apache.org/jira/browse/FLINK-5233>.
> >     As I said in the JIRA, I propose to upgrade to Jackson 2.7.8, as
> >     this version contains the fix for the issue, but its not a major
> >     jackson upgrade.
> >
> >     Any chance you could try to if 2.7.8 fixes the issue as well?
> >
> >
> >     On Fri, Dec 2, 2016 at 11:12 AM, Fabian Hueske <fhueske@gmail.com
> >     <mailto:fhueske@gmail.com>> wrote:
> >
> >         Hi Konstantin,
> >
> >         Regarding 2): I've opened FLINK-5227 to update the documentation
> >         [1].
> >
> >         Regarding the Row type: The Row type was introduced for
> >         flink-table and was later used by other modules. There is
> >         FLINK-5186 to move Row and all the related TypeInfo (+serializer
> >         and comparator) to flink-core [2]. That should solve your issue.
> >
> >         Some of the connector modules which provide TableSource and
> >         TableSinks have dependencies on flink-table as well. I'll check
> >         that these are optional dependencies to avoid that we pull in
> >         Calcite through connectors for jobs that do not not need it.
> >
> >         Thanks,
> >         Fabian
> >
> >         [1] https://issues.apache.org/jira/browse/FLINK-5227
> >         <https://issues.apache.org/jira/browse/FLINK-5227>
> >         [2] https://issues.apache.org/jira/browse/FLINK-5186
> >         <https://issues.apache.org/jira/browse/FLINK-5186>
> >
> >         2016-11-30 17:51 GMT+01:00 Konstantin Knauf
> >         <konstantin.knauf@tngtech.com
> >         <mailto:konstantin.knauf@tngtech.com>>:
> >
> >             Hi Stefan,
> >
> >             unfortunately, I can not share any heap dumps with you. I
> >             was able to
> >             resolve some of the issues my self today, the root causes
> >             were different
> >             for different jobs.
> >
> >             1) Jackson 2.7.2 (which comes with Flink) has a known class
> >             loading
> >             issue (see
> >             https://github.com/FasterXML/jackson-databind/issues/1363
> >             <https://github.com/FasterXML/jackson-databind/issues/1363>)
> .
> >             Shipping a shaded version of Jackson 2.8.4 with our user
> >             code helped. I
> >             recommend upgrading Flink's Jackson version soon.
> >
> >             2) We have a dependency on the flink-table [1] , which ships
> >             with
> >             Calcite including the Calcite JDBC Driver, which can not
> >             been collected
> >             cause of the known problem with the java.sql.DriverManager.
> >             Putting the
> >             flink-table in Flink's lib dir instead of shipping it with
> >             the user code
> >             helps. You should update the documentation, because this
> >             will always
> >             happen when using flink-table, I think. So I wonder, why
> >             this has not
> >             come up before actually.
> >
> >             3) Unresolved: Some Threads in a custom source which are not
> >             proberly
> >             shut down and keep references to the UserCodeClassLoader. I
> >             did not have
> >             time to look into this issue so far.
> >
> >             Cheers,
> >
> >             Konstantin
> >
> >             [1] Side note: We only need flink-table for the "Row" class
> >             used in the
> >             JdbcOutputFormat, so it might make sense to move this class
> >             somewhere
> >             else. Naturally, we also tried to exclude the "transitive"
> >             dependency on
> >             org.apache.calcite until we noticed that calcite is packaged
> >             with
> >             flink-table, so that you can not even exclude it. What is
> >             the reasons
> >             for this?
> >
> >
> >
> >
> >             On 30.11.2016 00:55, Stefan Richter wrote:
> >             > Hi,
> >             >
> >             > could you somehow provide us a heap dump from a TM that
> >             run for a while (ideally, shortly before an OOME)? This
> >             would greatly help us to figure out if there is a
> >             classloader leak that causes the problem.
> >             >
> >             > Best,
> >             > Stefan
> >             >
> >             >> Am 29.11.2016 um 18:39 schrieb Konstantin Knauf
> >             <konstantin.knauf@tngtech.com
> >             <mailto:konstantin.knauf@tngtech.com>>:
> >             >>
> >             >> Hi everyone,
> >             >>
> >             >> since upgrading to Flink 1.1.3 we observe frequent OOME
> >             Permgen Taskmanager Failures. Monitoring the permgen size on
> >             one of the Taskamanagers you can see that each Job (New Job
> >             and Restarts) adds a few MB, which can not be collected.
> >             Eventually, the OOME happens. This happens with all our
> >             Jobs, Streaming and Batch, on Yarn 2.4 as well as
> Stand-Alone.
> >             >>
> >             >> On Flink 1.0.2 this was not a problem, but I will
> >             investigate it further.
> >             >>
> >             >> The assumption is that Flink is somehow using one of the
> >             classes, which comes with our jar and by that prevents the
> >             gc of the whole class loader. Our Jars do not include any
> >             flink dependencies though (compileOnly), but of course many
> >             others.
> >             >>
> >             >> Any ideas anyone?
> >             >>
> >             >> Cheers and thank you,
> >             >>
> >             >> Konstantin
> >             >>
> >             >> sent from my phone. Plz excuse brevity and tpyos.
> >             >> ---
> >             >> Konstantin Knauf *konstantin.knauf@tngtech.com
> >             <mailto:konstantin.knauf@tngtech.com> * +49-174-3413182
> >             <tel:%2B49-174-3413182>
> >             >> TNG Technology Consulting GmbH, Betastr. 13a, 85774
> >             Unterföhring
> >             >> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr.
> >             Robert Dahlke
> >             >
> >             >
> >
> >             --
> >             Konstantin Knauf * konstantin.knauf@tngtech.com
> >             <mailto:konstantin.knauf@tngtech.com> * +49-174-3413182
> >             <tel:%2B49-174-3413182>
> >             TNG Technology Consulting GmbH, Betastr. 13a, 85774
> Unterföhring
> >             Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert
> >             Dahlke
> >             Sitz: Unterföhring * Amtsgericht München * HRB 135082
> >
> >
> >
> >
>
> --
> Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>
>

Mime
View raw message