impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-5150) Uneven load distribution of work across NUMA nodes
Date Thu, 25 May 2017 15:06:04 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mostafa Mokhtar resolved IMPALA-5150.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.9.0

> Uneven load distribution of work across NUMA nodes
> --------------------------------------------------
>
>                 Key: IMPALA-5150
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5150
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Mostafa Mokhtar
>             Fix For: Impala 2.9.0
>
>         Attachments: Screen Shot 2017-03-31 at 12.12.10 PM.png
>
>
> When doing concurrency testing as part of the competitive benchmarking I noticed that
it is very difficult to saturate all CPUs @100%
> Below is a snapshot from htop during a concurrency run, state below closely mimics the
steady state, note that CPUs 41-60 are less busy compared to 1-20.
> Then I ran the command below which dumps the threads and processor associated with each,
reference.
> for i in $(pgrep impalad); do ps -mo pid,tid,fname,user,psr -p $i;done
> From the man page for ps :
> {code}
> psr        PSR      processor that process is currently assigned to.
> {code}
> The output showed that a large number of threads are running on core 61, not surprisingly
the 1K threads are all thrift-server threads, so I am wondering if this is skewing the kernel's
ability to evenly distribute the threads across the cores or something. 
> I did a followup experiment using by profiling different core ranges on the system :
> Run 80 concurrent queries dominated by shuffle exchange
> Profile cores 01-20 to foo_01-20
> Profile cores 41-60 to foo_41-60
> Results showed that :
> Cores 01-20 had 50% more instructions retired
> Cores 01-20 show significantly more contention on pthread_cond_wait, base::internal::SpinLockDelay
and __lll_lock_wait 
> Skew is dominated by DataStreamSender
> ScannerThread(s) also show significant skew



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message