flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <chesnay.schep...@fu-berlin.de>
Subject Python API - Weird Performance Issue
Date Wed, 27 Aug 2014 18:34:01 GMT
Hello everyone,

This will be some kind of brainstorming question.

As some of you may know I am currently working on the Python API. The 
most crucial part here is how the data is exchanged between Java and Python.
Up to this point we used pipes for this, but switched recently to memory 
mapped files in hopes of increasing the (lacking) performance.

Early (simplified) prototypes (outside of Flink) showed that this would 
yield a significant increase. yet when i added the code to flink and ran 
a job, there was
no effect. like at all. two radically different schemes ran in /exactly/ 
the same time.

my conclusion was that code already in place (and not part of the 
prototypes) is responsible for this.
so i went ahead and modified the prototypes to use all relevant code 
from the Python API in order to narrow down the culprit. but this time, 
the performance increase was there.

Now here's the question: How can the /very same code/ perform so much 
worse when integrated into flink? if the code is not the problem, what 
could be it?

i spent a lot of time looking for that one line of code that cripples 
the performance, but I'm pretty much out of places to look.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message