reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Meleshko <andr...@microsoft.com>
Subject RE: IMRU initialization with train data
Date Tue, 05 Jul 2016 19:04:47 GMT
>...I think the example could
> be improved to be less confusing :)

Yes, RandomInputDataset configuration is somewhat obfuscated, 
but I think there is problem in InProc Runner, not the example. 
The example does configure RandomInputDataset, which does create random data (2 doubles per
partition by default in RundomInputpartition class).
But this data is never requested by the InProc Runner: 
- my breakpoint RandomInputPartition.GetPartitionHandle() is not hit.
- and the mapper gets the data from UpdateFunction.Initialize() only.

Should the Runner fetch each partition data to initialize MapInput first?
Currently it's using UpdateResult.mapInput only.

Andrey
> -----Original Message-----
> From: Markus Weimer [mailto:markus@weimo.de]
> Sent: Tuesday, July 5, 2016 10:02 AM
> To: dev@reef.apache.org
> Subject: Re: IMRU initialization with train data
> 
> On 2016-07-01 10:33 AM, Andrey Meleshko wrote:
> > 1) thanks for the pointer, I can see now, that BroadcastReduce example
> > is using RandomInputDataSet which is initialized as part of driver
> > initialization. But looks like partition data initialization is never
> > called and MapFunction never gets data from the dataset
> 
> Yes, the example doesn't use any actual data. Instead, it just uses the
> dataset to determine the number of maps to run. I think the example could
> be improved to be less confusing :)
> 
> >Anyone else thinks the pipeline configuration is a bit too
> >verbose/imperative?
> 
> Agreed. See REEF-1477 for the current thinking about an improvement.
> 
> Markus
Mime
View raw message