arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jed Brown <...@jedbrown.org>
Subject RE: [DISCUSS][C++][Proposal] Threading engine for Arrow
Date Fri, 03 May 2019 17:40:59 GMT
"Malakhov, Anton" <anton.malakhov@intel.com> writes:

>> > the library creates threads internally.  It's a disaster for managing
>> > oversubscription and affinity issues among groups of threads and/or
>> > multiple processes (e.g., MPI).
>
> This is exactly what I'm talking about referring as issues with threading composability!
OpenMP is not easy to have inside a library. I described it in this document: https://cwiki.apache.org/confluence/display/ARROW/Parallel+Execution+Engine

Thanks for this document.  I'm no great fan of OpenMP, but it's being
billed by most vendors (especially Intel) as the way to go in the
scientific computing space and has become relatively popular (much more
so than TBB).

You linked to a NumPy discussion
(https://github.com/numpy/numpy/issues/11826) that is encountering the
same issues, but proposing solutions based on the global environment.
That is perhaps acceptable for typical Python callers due to the GIL,
but C++ callers may be using threads themselves.  A typical example:

App:
  calls libB sequentially:
    calls Arrow sequentially (wants to use threads)
  calls libC sequentially:
    omp parallel (creates threads somehow):
      calls Arrow from threads (Arrow should not create more)
  omp parallel:
    calls libD from threads:
      calls Arrow (Arrow should not create more)

Arrow doesn't need to know the difference between the libC and libD
cases, but it may make a difference to the implementation of those
libraries.  In both of these cases, the user may desire that Arrow
create tasks for load balancing reasons (but no new threads) so long as
they can run on the specified thread team.

I have yet to see a complete solution to this problem, but we should
work out which modes are worth supporting and how that interface would
look.


Global solutions like this one (linked by Antoine)

  https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/thread-pool.cc#L268

imply that threading mode is global and set via an environment variable,
neither of which are true in cases such as the above (and many simpler
cases).

Mime
View raw message