flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Schmidtke <ro.schmid...@gmail.com>
Subject Re: Flink performance pre-packaged vs. self-compiled
Date Thu, 14 Apr 2016 23:11:57 GMT
You're obviously right, the configs were different. In the downloaded
version I had set off heap memory to true, whereas in the version I
compiled myself this one-time change to flink-conf.yaml was overwritten by
recompiling. I have fixed it now and performance is the same.

For the record, I had 30 GiB of TeraGen'd data:

-m yarn-cluster \
  -yn 10 \
  -ys 4 \
  -p 40 \
  -yjm 3072 \
  -ytm 4096

Each of the nodes has 64 GiB of RAM, job ran in 27s, repeatedly.

Thanks and sorry for not having checked the obvious ...

Robert

On Thu, Apr 14, 2016 at 10:23 PM, Ovidiu-Cristian MARCU <
ovidiu-cristian.marcu@inria.fr> wrote:

> Hi,
>
> Your assumption may be incorrect related to the TeraSort use case for
> eastcirclek's implementation.
> How many time did you run your program?
> It would be helpful to give more details about your experiment, in terms
> of configuration, dataset size.
>
> Best,
> Ovidiu
>
> On 14 Apr 2016, at 17:14, Robert Schmidtke <ro.schmidtke@gmail.com> wrote:
>
> I have tried multiple Maven and Scala Versions, but to no avail. I can't
> seem to achieve performance of the downloaded archive. I am stumped by this
> and will need to do more experiments when I have more time.
>
> Robert
>
> On Thu, Apr 14, 2016 at 1:13 PM, Robert Schmidtke <ro.schmidtke@gmail.com>
> wrote:
>
>> Hi Robert,
>>
>> thanks for the hint! Looks like something I could have figured out myself
>> -.-" I'll let you know if I find something.
>>
>> Robert
>>
>> On Thu, Apr 14, 2016 at 1:06 PM, Robert Metzger <rmetzger@apache.org>
>> wrote:
>>
>>> Hi Robert,
>>>
>>> check out the tools/create_release_files.sh file in the source tree.
>>> There you can see how we are building the release binaries.
>>> It would be quite interesting to find out what caused the performance
>>> difference.
>>>
>>> On Wed, Apr 13, 2016 at 5:03 PM, Robert Schmidtke <
>>> ro.schmidtke@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I'm using Flink 0.10.2 for some benchmarks and had to add some small
>>>> changes to Flink, which led me to compiling and running it myself. This is
>>>> when I noticed a performance difference in the pre-packaged Flink version
>>>> that I downloaded from the web (
>>>> http://archive.apache.org/dist/flink/flink-0.10.2/flink-0.10.2-bin-hadoop27.tgz)
>>>> versus the form of the release-0.10 branch I built myself (mvn
>>>> -Dhadoop.version=2.7.1 -Dscala-2.11 -DskipTests -Drat.skip=true clean
>>>> install // mvn version 3.0.4).
>>>>
>>>> I ran some version of TeraSort (https://github.com/eastcirclek/terasort)
>>>> and I noticed that the pre-packaged version of Flink performs 10-20% better
>>>> than the one I built myself (the only tweaks I mead are in the CliFrontend
>>>> after the Job has finished running, so I would rule out bad programming on
>>>> my side).
>>>>
>>>> Has anyone come across this before? Or could you provide me with
>>>> clearer build instructions in order to reproduce the downloadable archive
>>>> as closely as possible? Thanks in advance!
>>>>
>>>> Robert
>>>>
>>>> --
>>>> My GPG Key ID: 336E2680
>>>>
>>>
>>>
>>
>>
>> --
>> My GPG Key ID: 336E2680
>>
>
>
>
> --
> My GPG Key ID: 336E2680
>
>
>


-- 
My GPG Key ID: 336E2680

Mime
View raw message