hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "CUDA On Hadoop" by ChenHe
Date Wed, 16 Mar 2011 03:54:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "CUDA On Hadoop" page has been changed by ChenHe.
http://wiki.apache.org/hadoop/CUDA%20On%20Hadoop?action=diff&rev1=14&rev2=15

--------------------------------------------------

  Describe CUDA On Hadoop here.
  
  = Hadoop + CUDA =
- Here, I will share some experiences about CUDA performance study on Hadoop MapReduce clusters.
+ Here, I will share some experiences about [[http://cse.unl.edu/~che/slides/cuda.pdf|CUDA
performance study on Hadoop MapReduce clusters]].
  
  == Methodology ==
- From the parallel programming point of view, CUDA can hlep us to parallelize program in
the second level if we regard the MapReduce framework as the first level parallelization.
In our study, we provide Hadoop+CUDA solution for programming languages: Java and C/C++. The
scheduling of GPU threads among grids and blocks is not concerned in our study.
+ From the parallel programming point of view, CUDA can hlep us to parallelize program in
the second level if we regard the MapReduce framework as the first level parallelization [[Figure
1]]. In our study, we provide Hadoop+CUDA solution for programming languages: Java and C/C++.
The scheduling of GPU threads among grids and blocks is not concerned in our study.
  
  === For Java programmers ===
- If your MapReduce program is written in Java, you may need [[http://download.oracle.com/javase/6/docs/technotes/guides/jni/spec/jniTOC.html|JNI]]
to make use of CUDA. However, [[http://www.jcuda.org|JCuda]] provides a easier solution for
us. We introduce CUDA to our Map stage. The CUDA code is called by map() method within Map
class. It is easy to extend to Reduce stage if necessary. There are two ways to compile your
CUDA code.
+ If your MapReduce program is written in Java, you may need [[http://download.oracle.com/javase/6/docs/technotes/guides/jni/spec/jniTOC.html|JNI]]
to make use of CUDA. However, [[http://www.jcuda.org|JCuda]] provides an easy solution for
us. We introduce CUDA to our Map stage. The CUDA code is called by map() method within Map
class. It is easy to extend to Reduce stage if necessary. There are two ways to compile your
CUDA code.
  
  One is to write CUDA code as a String variable in your Java code. JCuda will automatically
compile it for you. The compiled binary file is located in tasktrackers working directory
that you can configure in mapred-site.xml file.
  
  The other is little bit tricky. you can manually compile the CUDA code into binary files
in advance and move them to tasktrackers working directory. And then every tasktracker can
access those compiled binary files.
  
  === For C/C++ programmers ===
- We employ CUDA SDK programs in our experiments. For CUDA SDK programs, we first digested
the code and partitioned the program into portions for data generation, bootstrapping, and
CUDA kernels, with the former two components transformed respectively into a standalone data
generator and a virtual method callable from the map method in our MapRed utility class. The
CUDA kernel is kept as-is since we want to perform the same computation on the GPU only in
a distributed fashion. The data generator is augmented with the feature for taking command-line
arguments such that we can specify input sizes and output location for different experiment
runs. We reuse the code for boot-strapping a kernel execution into part of the mapper workload,
thus providing a seamless integration of CUDA and Hadoop. The architecture of the ported CUDA
SDK programs onto Hadoop is shown in Figure 1. For reusability, we have used object-oriented
design by abstracting the mapper and reducer functions into a base class, i.e., MapRed. For
different computing, we can override the following virtual methods defined by MapRed:
+ We employ CUDA SDK programs in our experiments. For CUDA SDK programs, we first digested
the code and partitioned the program into portions for data generation, bootstrapping, and
CUDA kernels, with the former two components transformed respectively into a standalone data
generator and a virtual method callable from the map method in our MapRed utility class. The
CUDA kernel is kept as-is since we want to perform the same computation on the GPU only in
a distributed fashion. The data generator is augmented with the feature for taking command-line
arguments such that we can specify input sizes and output location for different experiment
runs. We reuse the code for boot-strapping a kernel execution into part of the mapper workload,
thus providing a seamless integration of CUDA and Hadoop. The architecture of the ported CUDA
SDK programs onto Hadoop is shown in Figure 2. For reusability, we have used object-oriented
design by abstracting the mapper and reducer functions into a base class, i.e., MapRed. For
different computing, we can override the following virtual methods defined by MapRed:
  
- [[http://cse.unl.edu/~che/images/streaming-2.bmp|Figure 1]]
+ [[http://cse.unl.edu/~che/images/streaming-2.bmp|Figure 2]]
  
  {{{
   void processHadoopData(string& input);

Mime
View raw message