hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Guo <paul...@gmail.com>
Subject About hawq-config slowness
Date Tue, 23 Aug 2016 14:31:43 GMT
Recently I noticed hawq-config seems to be slow, e.g. A simple guc setting
command line "hawq config -c lc_messages -v en_US.UTF-8" roughly costs 6+
seconds on my centos vm, but looking into the details of the command line,
I found this is really not expected.

Quickly looked into the haws-config and python lib code, I found it looks
like that several issues below affects the speed.

1) gpscp
It still uses popen2.Popen4(). This function introduces millions of useless
close() sys call finally in above test command. Using
subprocess.Popen()  without close_fds as an alternative easily resolve this.

2) gppylib/commands/base.py

    def __init__(self,name,pool,timeout=5):

The worker thread will block at most 5 seconds in each loop (Queue.get())
to fetch potential commands even we have known that there will be no more
commands to run for some threads. This really does not make sense since
some idle threads will block for 5 seconds also before exiting.

Setting timeout to zero will make python code spin. I tested a small
timeout value e.g. 0.1s and it works fine. It seems that  0.1 is a good
timeout candidate.

3) gppylib/commands/base.py

    def __init__(self,numWorkers=16,items=None):

WorkerPool by default creates 16 threads but to my knowledge, cpython's
Thread does not work fine due to the global GIL lock. I'm not an python
expert so I'm wondering whether less thread number (e.g. 8) is really
enough? Either from theory or from practice (e.g.previous test results).

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message