commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] threading redux
Date Fri, 17 Apr 2015 16:47:29 GMT
On 4/17/15 9:01 AM, Gilles wrote:
> On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote:
>> On 4/17/15 3:14 AM, Gilles wrote:
>>> Hello.
>>>
>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>> Consider me poked!
>>>>
>>>> So, the Java answer to "how do I run things in multiple threads"
>>>> is to
>>>> use an Executor (java.util).  This doesn't necessarily mean
>>>> that you
>>>> *have* to use a separate thread (the implementation could execute
>>>> inline).  However, in order to accommodate the separate thread
>>>> case,
>>>> you would need to code to a Future-like API.  Now, I'm not
>>>> saying to
>>>> use Executors directly, but I'd provide some abstraction layer
>>>> above
>>>> them or in lieu of them, something like:
>>>>
>>>> public interface ExecutorThingy {
>>>>   Future<T> execute(Function<T> fn);
>>>> }
>>>>
>>>> One could imagine implementing different ExecutorThingy
>>>> implementations which allow you to parallelize things in different
>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>
>>> I did not understand what is being suggested: parallelization of a
>>> single algorithm or concurrent calls to multiple instances of an
>>> algorithm?
>>
>> Really both.  It's probably best to look at some concrete examples.
>
> Certainly...
>
>> The two I mentioned in my apachecon talk are:
>>
>> 1.  Threads managed by some external process / application gathering
>> statistics to be aggregated.
>>
>> 2.  Allowing multiple threads to concurrently execute GA
>> transformations within the GeneticAlgorithm "evolve" method.
>
> I could not view the presentation from the link previously mentioned
> (it did not work with my browser...).
> Can I download the PDF file from somewhere?

Sorry.  Try this (unshortened) link

http://www.slideshare.net/psteitz/commons-mathapacheconna2015
>
>> It would be instructive to think about how to handle both of these
>> use cases using something like what James is suggesting.  What is
>> nice about his idea is that it could give us a way to let users /
>> systems decide whether they want to have [math] algorithms spawn
>> threads to execute concurrently or to allow an external execution
>> framework to handle task distribution across threads.
>
> Some (all?) cases of "external" parallelism are trivial for the CM
> developers: the user must chop his data, pass the chunks as arguments
> to the CM methods, then collect and reassemble the results, all by
> himself.
> IIUC the scenario, this cannot be deemed a "feature".

The idea is to make it easier for users to do this "chopping" and
"reassembling" and / or to let these operations be managed by
external frameworks.

The AggregatedStatistics class is a simple example of making it
easier for users to do directly.
>
>> Since 2. above is a good example of "internal" parallelism and it
>> also has data sharing / transfer challenges, maybe its best to start
>> with that one.
>
> That's the scenario where usage is simple and performance can match
> the user's machine capability when running CM algorithms that are
> inherently parallel.
>
> There is an example in CM: see
>   testTravellerSalesmanSquareTourParallelSolver()
> in
>   org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest

The challenge is how to make this kind of thing possible "simply"
without just pegging the local machine's cores in an unmanaged
way.   I think James has the kernel of an idea that would allow us
to have it both ways - "greedy / local" or "managed / remotable."  
This is all hand-waving at this point; but the idea that we could
find a way to make our parallelizable algorithms executable via
locally spawned threads or external task managers is appealing.
>
>> I have just started thinking about this and would
>> love to get better ideas than my own hacking about how to do it
>>
>> a) Using Spark with RDD's to maintain population state data
>> b) Hadoop with HDFS (or something else?)
>
> I have zero experience with this but I'm interested to know more. :-)

I am also just learning Spark.  It will likely take me a while to
get something meaningful; but I will start playing with this.  Other
ideas / patches welcome!

Phil
>
> Regards,
> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message