flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Habib Mostafaei <ha...@inet.tu-berlin.de>
Subject Re: low performance in running queries
Date Fri, 01 Nov 2019 08:40:27 GMT
I used streaming WordCount provided by Flink and the file contains text 
like "This is some text...". I just copied several times.

Best,

Habib

On 11/1/2019 6:03 AM, Zhenghua Gao wrote:
> 2019-10-30 15:59:52,122 INFO  org.apache.flink.runtime.taskmanager.Task             
       - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4)
switched from DEPLOYING to RUNNING.
> 2019-10-30 17:45:10,943 INFO  org.apache.flink.runtime.taskmanager.Task             
       - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4)
switched from RUNNING to FINISHED.
> It's surprise that the source task uses 95 mins to read a 2G file.
> Could you give me your code snippets and some sample lines of the 2G file?
> I will try to reproduce your scenario and dig the root causes.
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Thu, Oct 31, 2019 at 9:05 PM Habib Mostafaei 
> <habib@inet.tu-berlin.de <mailto:habib@inet.tu-berlin.de>> wrote:
>
>     I enclosed all logs from the run and for this run I used
>     parallelism one. However, for other runs I checked and found that
>     all parallel workers were working properly. Is there a simple way
>     to get profiling information in Flink?
>
>     Best,
>
>     Habib
>
>     On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>     I think more runtime information would help figure
>>     outwheretheproblem is.
>>     1) how many parallelisms actually working
>>     2) the metrics for each operator
>>     3) the jvm profiling information, etc
>>
>>     *Best Regards,*
>>     *Zhenghua Gao*
>>
>>
>>     On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei
>>     <habib@inet.tu-berlin.de <mailto:habib@inet.tu-berlin.de>> wrote:
>>
>>         Thanks Gao for the reply. I used the parallelism parameter
>>         with different values like 6 and 8 but still the execution
>>         time is not comparable with a single threaded python script.
>>         What would be the reasonable value for the parallelism?
>>
>>         Best,
>>
>>         Habib
>>
>>         On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>         The reason might be the parallelism of your task is only 1,
>>>         that's too low.
>>>         See [1] to specify proper parallelism  for your job, and the
>>>         execution time should be reduced significantly.
>>>
>>>         [1]
>>>         https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>>
>>>         *Best Regards,*
>>>         *Zhenghua Gao*
>>>
>>>
>>>         On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
>>>         <habib@inet.tu-berlin.de <mailto:habib@inet.tu-berlin.de>>
>>>         wrote:
>>>
>>>             Hi all,
>>>
>>>             I am running Flink on a standalone cluster and getting
>>>             very long
>>>             execution time for the streaming queries like WordCount
>>>             for a fixed text
>>>             file. My VM runs on a Debian 10 with 16 cpu cores and
>>>             32GB of RAM. I
>>>             have a text file with size of 2GB. When I run the Flink
>>>             on a standalone
>>>             cluster, i.e., one JobManager and one taskManager with
>>>             25GB of heapsize,
>>>             it took around two hours to finish counting this file
>>>             while a simple
>>>             python script can do it in around 7 minutes. Just
>>>             wondering what is
>>>             wrong with my setup. I ran the experiments on a cluster
>>>             with six
>>>             taskManagers, but I still get very long execution time
>>>             like 25 minutes
>>>             or so. I tried to increase the JVM heap size to have
>>>             lower execution
>>>             time but it did not help. I attached the log file and
>>>             the Flink
>>>             configuration file to this email.
>>>
>>>             Best,
>>>
>>>             Habib
>>>
>

Mime
View raw message