hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Li <...@pivotal.io>
Subject Re: About hawq-config slowness
Date Mon, 05 Sep 2016 14:19:51 GMT
Hi Paul,

Have you tested the time spent improvement for these 3 separate
enhancements fix?

And for the point (3), I wonder the best thread number is affected by the
hawq segment nodes number and the command numbers in the pool need to be
pass.

Thanks.


On Sat, Aug 27, 2016 at 4:39 PM, Paul Guo <paulguo@gmail.com> wrote:

> Looked into python GIL documents. It looks that even being with GIL, python
> multiple threads could possibly help some IO-bound (at least non-cpu-bound)
> tasks. We could limit the worker number be equal to min(16, queue entry
> number). Thanks.
>
> 2016-08-23 23:06 GMT+08:00 Kyle Dunn <kdunn@pivotal.io>:
>
> > Paul,
> >
> > This is a great finding! I think additional worker threads might make
> sense
> > when clusters are larger but otherwise a number like 8 is a safe bet,
> > especially given its positive impact on the user experience in the
> majority
> > of cases.
> >
> > +1 for tuning these down to improve latency.
> >
> > -Kyle
> >
> > On Tue, Aug 23, 2016, 08:31 Paul Guo <paulguo@gmail.com> wrote:
> >
> > > Recently I noticed hawq-config seems to be slow, e.g. A simple guc
> > setting
> > > command line "hawq config -c lc_messages -v en_US.UTF-8" roughly costs
> 6+
> > > seconds on my centos vm, but looking into the details of the command
> > line,
> > > I found this is really not expected.
> > >
> > > Quickly looked into the haws-config and python lib code, I found it
> looks
> > > like that several issues below affects the speed.
> > >
> > > 1) gpscp
> > > It still uses popen2.Popen4(). This function introduces millions of
> > useless
> > > close() sys call finally in above test command. Using
> > > subprocess.Popen()  without close_fds as an alternative easily resolve
> > > this.
> > >
> > > 2) gppylib/commands/base.py
> > >
> > >     def __init__(self,name,pool,timeout=5):
> > >
> > > The worker thread will block at most 5 seconds in each loop
> (Queue.get())
> > > to fetch potential commands even we have known that there will be no
> more
> > > commands to run for some threads. This really does not make sense since
> > > some idle threads will block for 5 seconds also before exiting.
> > >
> > > Setting timeout to zero will make python code spin. I tested a small
> > > timeout value e.g. 0.1s and it works fine. It seems that  0.1 is a good
> > > timeout candidate.
> > >
> > > 3) gppylib/commands/base.py
> > >
> > >     def __init__(self,numWorkers=16,items=None):
> > >
> > > WorkerPool by default creates 16 threads but to my knowledge, cpython's
> > > Thread does not work fine due to the global GIL lock. I'm not an python
> > > expert so I'm wondering whether less thread number (e.g. 8) is really
> > > enough? Either from theory or from practice (e.g.previous test
> results).
> > >
> > --
> > *Kyle Dunn | Data Engineering | Pivotal*
> > Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message