commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [math] threading redux
Date Sun, 19 Apr 2015 02:14:10 GMT
On Fri, 17 Apr 2015 16:53:56 -0500, James Carman wrote:
> Do you have any pointers to code for this ForkJoin mechanism?  I'm
> curious to see it.
>
> The key thing you will need in order to support parallelization in a
> generic way

What do you mean by "generic way"?

I'm afraid that we may be trying to compare apples and oranges;
each of us probably has in mind a "prototype" algorithm and an idea
of how to implement it to make it run in parallel.

I think that it would focus the discussion if we could
1. tell what the "prototype" is,
2. show a sort of pseudo-code of the difference between a sequential
    and a parallel run of this "prototype" (i.e. what is the data, how
    the (sub)tasks operate on them).

Regards,
Gilles

> is to not tie it directly to threads, but use some
> abstraction layer above threads, since that may not be the "worker"
> method you're using at the time.
>
> On Fri, Apr 17, 2015 at 2:57 PM, Thomas Neidhart
> <thomas.neidhart@gmail.com> wrote:
>> On 04/17/2015 05:35 PM, Phil Steitz wrote:
>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>> Hello.
>>>>
>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>> Consider me poked!
>>>>>
>>>>> So, the Java answer to "how do I run things in multiple threads"
>>>>> is to
>>>>> use an Executor (java.util).  This doesn't necessarily mean that 
>>>>> you
>>>>> *have* to use a separate thread (the implementation could execute
>>>>> inline).  However, in order to accommodate the separate thread 
>>>>> case,
>>>>> you would need to code to a Future-like API.  Now, I'm not saying 
>>>>> to
>>>>> use Executors directly, but I'd provide some abstraction layer 
>>>>> above
>>>>> them or in lieu of them, something like:
>>>>>
>>>>> public interface ExecutorThingy {
>>>>>   Future<T> execute(Function<T> fn);
>>>>> }
>>>>>
>>>>> One could imagine implementing different ExecutorThingy
>>>>> implementations which allow you to parallelize things in 
>>>>> different
>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>
>>>> I did not understand what is being suggested: parallelization of a
>>>> single algorithm or concurrent calls to multiple instances of an
>>>> algorithm?
>>>
>>> Really both.  It's probably best to look at some concrete examples.
>>> The two I mentioned in my apachecon talk are:
>>>
>>> 1.  Threads managed by some external process / application 
>>> gathering
>>> statistics to be aggregated.
>>>
>>> 2.  Allowing multiple threads to concurrently execute GA
>>> transformations within the GeneticAlgorithm "evolve" method.
>>>
>>> It would be instructive to think about how to handle both of these
>>> use cases using something like what James is suggesting.  What is
>>> nice about his idea is that it could give us a way to let users /
>>> systems decide whether they want to have [math] algorithms spawn
>>> threads to execute concurrently or to allow an external execution
>>> framework to handle task distribution across threads.
>>
>> I since a more viable option is to take advantage of the ForkJoin
>> mechanism that we can use now in math 4.
>>
>> For example, the GeneticAlgorithm could be quite easily changed to 
>> use a
>> ForkJoinTask to perform each evolution, I will try to come up with 
>> an
>> example soon as I plan to work on the genetics package anyway.
>>
>> The idea outlined above sounds nice but it is very unclear how an
>> algorithm or function would perform its parallelization in such a 
>> way,
>> and whether it would still be efficient.
>>
>> Thomas
>>
>>> Since 2. above is a good example of "internal" parallelism and it
>>> also has data sharing / transfer challenges, maybe its best to 
>>> start
>>> with that one.  I have just started thinking about this and would
>>> love to get better ideas than my own hacking about how to do it
>>>
>>> a) Using Spark with RDD's to maintain population state data
>>> b) Hadoop with HDFS (or something else?)
>>>
>>> Phil
>>>>
>>>>
>>>> Gilles
>>>>
>>>>>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message