horn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Jaffee <...@case.edu>
Subject Re: [DISCUSS] CPU-only for large NN
Date Mon, 07 Dec 2015 21:52:04 GMT
I agree, trying to compete with the many GPU-oriented projects will be very
difficult. So for now, I agree building out the CPU cluster makes more
sense than other alternatives.

On Sun, Dec 6, 2015 at 6:43 PM, Edward J. Yoon <edward.yoon@samsung.com>
wrote:

> Good insights :-)
>
> Basically, it's hard to expect GPU would speed-up the sparse NN bc they're
> optimized for dense mat-mult. To regularize NN, there're drop-out or
> pruning
> techniques as you know. So, CPU-based Apache Horn can be more flexible and
> useful for both data and model parallel than others I think.
>
> However, if high-performance and optimized system is required in real-world
> applications, GPU might better. So, I (originally) mean that we need to
> focus
> more on flexible and scalable CPU cluster, instead of compete with
> GPU-oriented projects.
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Zachary Jaffee [mailto:zij@case.edu]
> Sent: Saturday, December 05, 2015 5:38 AM
> To: Unknown
> Subject: Re: [DISCUSS] CPU-only for large NN
>
> Isn't it the case that GPU computing is better for matrix multiplication
> and other heavy mathematical workloads, meaning that we would want to
> eventually incorporate it in some way. However, I do think that a more
> sparse neural network will see a benefit when being trained on CPUs as they
> are more simple computations and the time it would take to send the
> information to the GPU might make things slower, but in the paper you
> emailed out yesterday they mentioned that sparse networks have lower
> accuracy so that might be something else we want to look at additionally.
> However, in this paper (http://arxiv.org/pdf/1102.4240.pdf), sparsity is
> discussed as a means of having a more versatility for a RNN. So if thats
> something we want to look into that's an option.
>
> I think it's becoming clearer that data parallelism would perform better on
> CPUs, whereas, model parallelism would work better on GPUs, which would
> make sense according to what I'm reading here (
> http://arxiv.org/pdf/1404.5997v2.pdf), due to the compute vs data transfer
> bottlenecks. If we could figure out a way to determine whether computation
> per weight or computation per neuron is higher for any given subproblem,
> and be able to allow it to switch quickly between running on the CPU to the
> GPU when this detection happens, we would be in a very good place. Namely,
> for various sub nets within a massively large neural network, where parts
> of the network is more sparse, we would run everything on a CPU, using more
> data parallel techniques, where when we see a denser sub net, we would take
> advantage of the GPU using model parallel techniques.
>
> I also think that this represents the best description of a biological
> system, as the human brain is highly sparse, but at the same time, there
> are certain areas of the brain that are more dense that deal with the more
> computationally intense activities such as vision, and as we have seen
> image detection and tagging, having a toolkit that focuses on the dense and
> parallel makes the most sense on this micro scale. But when it comes to a
> more macro scale problem, i.e. building a single system that would link
> vision with motor movements, I think its safe to say that speed is much
> more important here.
>
> I also could be misunderstanding how this works, so let me know if what I
> am saying make sense.
>
> On Fri, Dec 4, 2015 at 5:30 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
>
> > Hi forks,
> >
> > Instead of compete (performance) with GPU-based deep learning
> > projects, I realized that we need to focus more on CPU cluster and
> > optimization.
> >
> > For instance, we can provide the easy way to design large-scale neural
> > network with more intuitive programming interface. Then, it also can
> > be used for reducing the model size with pruning techniques, so that
> > it can be fitted in GPU memory (if neural network is getting more and
> > more sparse by pruning, CPU might have the advantage over GPU).
> >
> > WDYT?
> >
> > --
> > Best Regards, Edward J. Yoon
> >
>
>
>
> --
> Zach Jaffee
> B.S. Computer Science
> Case Western Reserve University Class of 2017
> Operations Director | WRUW FM 91.1 Cleveland
> Secretary | Recruitment Chair | Phi Kappa Theta Fraternity
> (917) 881-0646
> zjaffee.com
> github.com/ZJaffee
>
>
>


-- 
Zach Jaffee
B.S. Computer Science
Case Western Reserve University Class of 2017
Operations Director | WRUW FM 91.1 Cleveland
Secretary | Recruitment Chair | Phi Kappa Theta Fraternity
(917) 881-0646
zjaffee.com
github.com/ZJaffee

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message