hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyle Dunn <kd...@pivotal.io>
Subject Re: About hawq-config slowness
Date Tue, 23 Aug 2016 15:06:51 GMT
Paul,

This is a great finding! I think additional worker threads might make sense
when clusters are larger but otherwise a number like 8 is a safe bet,
especially given its positive impact on the user experience in the majority
of cases.

+1 for tuning these down to improve latency.

-Kyle

On Tue, Aug 23, 2016, 08:31 Paul Guo <paulguo@gmail.com> wrote:

> Recently I noticed hawq-config seems to be slow, e.g. A simple guc setting
> command line "hawq config -c lc_messages -v en_US.UTF-8" roughly costs 6+
> seconds on my centos vm, but looking into the details of the command line,
> I found this is really not expected.
>
> Quickly looked into the haws-config and python lib code, I found it looks
> like that several issues below affects the speed.
>
> 1) gpscp
> It still uses popen2.Popen4(). This function introduces millions of useless
> close() sys call finally in above test command. Using
> subprocess.Popen()  without close_fds as an alternative easily resolve
> this.
>
> 2) gppylib/commands/base.py
>
>     def __init__(self,name,pool,timeout=5):
>
> The worker thread will block at most 5 seconds in each loop (Queue.get())
> to fetch potential commands even we have known that there will be no more
> commands to run for some threads. This really does not make sense since
> some idle threads will block for 5 seconds also before exiting.
>
> Setting timeout to zero will make python code spin. I tested a small
> timeout value e.g. 0.1s and it works fine. It seems that  0.1 is a good
> timeout candidate.
>
> 3) gppylib/commands/base.py
>
>     def __init__(self,numWorkers=16,items=None):
>
> WorkerPool by default creates 16 threads but to my knowledge, cpython's
> Thread does not work fine due to the global GIL lock. I'm not an python
> expert so I'm wondering whether less thread number (e.g. 8) is really
> enough? Either from theory or from practice (e.g.previous test results).
>
-- 
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message