hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Guo <paul...@gmail.com>
Subject Re: About hawq-config slowness
Date Sat, 27 Aug 2016 08:39:26 GMT
Looked into python GIL documents. It looks that even being with GIL, python
multiple threads could possibly help some IO-bound (at least non-cpu-bound)
tasks. We could limit the worker number be equal to min(16, queue entry
number). Thanks.

2016-08-23 23:06 GMT+08:00 Kyle Dunn <kdunn@pivotal.io>:

> Paul,
>
> This is a great finding! I think additional worker threads might make sense
> when clusters are larger but otherwise a number like 8 is a safe bet,
> especially given its positive impact on the user experience in the majority
> of cases.
>
> +1 for tuning these down to improve latency.
>
> -Kyle
>
> On Tue, Aug 23, 2016, 08:31 Paul Guo <paulguo@gmail.com> wrote:
>
> > Recently I noticed hawq-config seems to be slow, e.g. A simple guc
> setting
> > command line "hawq config -c lc_messages -v en_US.UTF-8" roughly costs 6+
> > seconds on my centos vm, but looking into the details of the command
> line,
> > I found this is really not expected.
> >
> > Quickly looked into the haws-config and python lib code, I found it looks
> > like that several issues below affects the speed.
> >
> > 1) gpscp
> > It still uses popen2.Popen4(). This function introduces millions of
> useless
> > close() sys call finally in above test command. Using
> > subprocess.Popen()  without close_fds as an alternative easily resolve
> > this.
> >
> > 2) gppylib/commands/base.py
> >
> >     def __init__(self,name,pool,timeout=5):
> >
> > The worker thread will block at most 5 seconds in each loop (Queue.get())
> > to fetch potential commands even we have known that there will be no more
> > commands to run for some threads. This really does not make sense since
> > some idle threads will block for 5 seconds also before exiting.
> >
> > Setting timeout to zero will make python code spin. I tested a small
> > timeout value e.g. 0.1s and it works fine. It seems that  0.1 is a good
> > timeout candidate.
> >
> > 3) gppylib/commands/base.py
> >
> >     def __init__(self,numWorkers=16,items=None):
> >
> > WorkerPool by default creates 16 threads but to my knowledge, cpython's
> > Thread does not work fine due to the global GIL lock. I'm not an python
> > expert so I'm wondering whether less thread number (e.g. 8) is really
> > enough? Either from theory or from practice (e.g.previous test results).
> >
> --
> *Kyle Dunn | Data Engineering | Pivotal*
> Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message