arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Malakhov, Anton" <anton.malak...@intel.com>
Subject RE: [DISCUSS][C++][Proposal] Threading engine for Arrow
Date Fri, 03 May 2019 16:30:20 GMT
Thanks for your answers,

> -----Original Message-----
> From: Antoine Pitrou [mailto:antoine@python.org]
> Sent: Friday, May 3, 2019 03:54

> Le 03/05/2019 à 05:47, Jed Brown a écrit :
> > I would caution to please not commit to the MKL/BLAS model in which
I'm actually talking about threading layers model where MKL supports several OpenMP runtimes
(Intel, GNU, PGI) and TBB, as well as non-threaded version. It even supports dynamic selection,
please refer to: https://software.intel.com/en-us/mkl-macos-developer-guide-dynamically-selecting-the-interface-and-threading-layer
The same approach we implemented in Numba (#2245):  https://numba.pydata.org/numba-doc/dev/user/threading-layer.html

> > the library creates threads internally.  It's a disaster for managing
> > oversubscription and affinity issues among groups of threads and/or
> > multiple processes (e.g., MPI).
This is exactly what I'm talking about referring as issues with threading composability! OpenMP
is not easy to have inside a library. I described it in this document: https://cwiki.apache.org/confluence/display/ARROW/Parallel+Execution+Engine

> Implicit multi-threading is important for user-friendliness reasons (especially in
> higher-level bindings such as the Python-bindings).
Cannot agree more! There might be not enough parallelism on the application level, adding
parallelism from DSLs is important for better CPU utilization but it is also tricky because
of these incompatibility issues.

> > The library is then free to use constructs like omp taskgroup/taskloop
> > as granularity warrants; it will never utilize threads that the
> > application didn't explicitly give it.
> 
> I don't think we're planning to use OpenMP in Arrow, though Wes probably has a
> better answer.	
I'd not exclude OpenMP from the consideration completely. I want to start with TBB but nothing
composes better with OpenMP as OpenMP itself. The same MKL (i.e. Numpy) defaults to OpenMP
threading. BTW, there is no more compatibility layer between TBB and OpenMP, it was removed
from the latter.


> -----Original Message-----
> From: Antoine Pitrou [mailto:antoine@python.org]
> Sent: Friday, May 3, 2019 03:49
> 
> Another possibility is to look at our C++ CSV reader and parser (in
> src/arrow/csv).  It's the only piece of Arrow that uses non-trivial multi-threading
> right now (with tasks spawning new tasks dynamically, see
> InferringColumnBuilder).  It's based on the ThreadPool and TaskGroup APIs (in
> src/arrow/util/).  These APIs are not set in stone, so you're free to propose
> changes to make them fit better with a TBB-based implementation.
Great! This is what I was looking for!


// Anton

Mime
View raw message