hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lei Chang <lei_ch...@apache.org>
Subject Re: Network interconnect settings in IaaS environments
Date Sat, 17 Sep 2016 04:52:56 GMT
Here is some more information around hawq interconnect. But NOTE that the
default value tuning is all on *physical* hardware and not on Azure. On
amazon and vmware, looks all default settings work fine.

·       gp_interconnect_type: Sets the protocol used for inter-node
communication. Valid values are "tcp", "udp" “udp” is the new udp
interconnect implementation with flow control. Default value is “udp”.

·       gp_interconnect_fc_method: Sets the flow control method used for
UDP interconnect. Valid values are "capacity" and "loss". For “capacity”
based flow control, senders do not send packets when receivers do not have
capacity. “Loss” based flow control is based on “capacity” based flow
control, and it also tunes sending speed according to packet losses.
Default value is “loss”.

·       gp_interconnect_snd_queue_depth: A new parameter used to specify
the average size of a send queue. The buffer pool size for each send
process can be calculated by using gp_interconnect_snd_queue_depth * number
of processes in the downstream gang. The default value is 2.

·       gp_interconnect_cache_future_packets: A new parameter used to
control whether future packets are cached at receiver side. Default value
is “true”

·       gp_udp_bufsize_k: gp_udp_bufsize_k is changed from “PGC_SUSET” to
“PGC_BACKEND” to make customer customize the size of socket buffers used by
interconnect. And the maximal value of is changed to 32768KB = 32M.

For UDP interconnect, end users should tune the OS kernel memory used by
sockets. On Linux, these are

·       net.core.rmem_max

·       net.core.wmem_max

·       txqueuelen (Transmit Queue Length)

Recommended values for net.core.rmem(wmem)_max are 2M (or greater). And the
txqueuelen can be increased if OS introduces some packets losses due to
kernel ring buffer overflow. If the number of nodes is large, users should
pay attention to the queue depth and socket buffer size settings to avoid
potential packets losses due to a small OS buffer size.

On Sat, Sep 17, 2016 at 12:44 PM, Lei Chang <lei_chang@apache.org> wrote:

> please see the comments inline
> On Sat, Sep 17, 2016 at 3:07 AM, Kyle Dunn <kdunn@pivotal.io> wrote:
>> In an ongoing evaluation of HAWQ in Azure, we've encountered some
>> sub-optimal network performance. It would be great to get some additional
>> information about a few server parameters related to the network:
>> - gp_max_packet_size
>>    The default is documented at 8192. Why was this number chosen? Should
>> this value be aligned with the network infrastructure's configured MTU,
>> accounting for the packet header size of the chosen interconnect type?
>>  (Azure only support MTU 1500 and has been showing better reliability
>> using
>> TCP in Greenplum)
> 8K is an empirical value when we evaluate the interconnect performance on
> physical hardware. It is shown that 8K has the optimal performance.
> But on Azure, it is not benchmarked, looks like udp on azure is not
> stable. you can set "gp_interconnect_log_stats" to see the statistics
> about the queries. And you can also use ifconfig to see the errors about
> packets.
> If the network is not stable, it deserves a try to decrease the value to
> less than 1500 to align the user space packet size with maximal kernel
> packet size. But Decreasing the value increases the cpu cost
> for marshaling/unmarshalling the packets. There will be a tradeoff here.
>> - gp_interconnect_type
>>     The docs claim UDPIFC is the default, UDP is the observed default. Do
>> the recommendations around which setting to use vary in an IaaS
>> environment
>> (AWS or Azure)?
> which doc? when we release UDPIFC for gpdb, we kept old UDP and added
> UDPIFC to avoid potential regressions since there are a lot of UDP
> deployments for gpdb at that time. After UDPIFC was released, it is shown
> UDPIFC is much more stable and perform better than UDP. So when we release
> hawq, we just replaced UDP with UDPIFC. But use UDP for the name. So UDP is
> There are two flow control methods in UDPIFC, I'd like suggest you have a
> try: Gp_interconnect_fc_method (INTERCONNECT_FC_METHOD_CAPACITY &
>> - gp_interconnect_queue_depth
>>    My naive read of this is performance can be traded off for (potentially
>> significant) RAM utilization. Is there additional detail around turning
>> this knob? How does the interaction between this and the underlying NIC
>> queue depth affect performance? As an example, in Azure, disabling TX
>> queuing (ifconfig eth0 txqueue 0) on the virtual NIC improved benchmark
>> performance, as the underlying HyperV host is doing it's own queuing
>> anyway.
> This queue is application level queue, and use for caching, handling
> out-of-order and lost packets.
> According to our past performance testing on physical hardware, increasing
> it to a large value does not show a lot of benefits. Too small value does
> impact performance. But it needs more testing on Azure I think.
>> Thanks,
>> Kyle
>> --
>> *Kyle Dunn | Data Engineering | Pivotal*
>> Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message