lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: [jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
Date Wed, 20 May 2015 15:40:22 GMT
If you're going to do be shuffling data to multiple worker nodes then data
will be crossing the network. Shuffling provides the foundation for certain
parallel computing tasks, such as performing large scale parallel
relational algebra.

For machine learning algorithms we'll likely need a parallel iterative
design which leaves the data in place.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 20, 2015 at 4:11 PM, Yonik Seeley <yseeley@gmail.com> wrote:

> On Wed, May 20, 2015 at 11:06 AM, Noble Paul <noble.paul@gmail.com> wrote:
> > The problem with streaming is data locality. Data needs to be transferred
> > across network to do the processing
>
> Nothing saying that you can't process data before it's streamed out, right?
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message