Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Wed, 16 Mar 2011 03:54:11 -0000
Message-ID: <20110316035411.99390.98598@eosnew.apache.org>
Subject: 
 =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22CUDA_On_Hadoop=22_by_ChenHe?=

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch=
ange notification.

The "CUDA On Hadoop" page has been changed by ChenHe.
http://wiki.apache.org/hadoop/CUDA%20On%20Hadoop?action=3Ddiff&rev1=3D14&re=
v2=3D15

--------------------------------------------------

  Describe CUDA On Hadoop here.
  =

  =3D Hadoop + CUDA =3D
- Here, I will share some experiences about CUDA performance study on Hadoo=
p MapReduce clusters.
+ Here, I will share some experiences about [[http://cse.unl.edu/~che/slide=
s/cuda.pdf|CUDA performance study on Hadoop MapReduce clusters]].
  =

  =3D=3D Methodology =3D=3D
- From the parallel programming point of view, CUDA can hlep us to parallel=
ize program in the second level if we regard the MapReduce framework as the=
 first level parallelization. In our study, we provide Hadoop+CUDA solution=
 for programming languages: Java and C/C++. The scheduling of GPU threads a=
mong grids and blocks is not concerned in our study.
+ From the parallel programming point of view, CUDA can hlep us to parallel=
ize program in the second level if we regard the MapReduce framework as the=
 first level parallelization [[Figure 1]]. In our study, we provide Hadoop+=
CUDA solution for programming languages: Java and C/C++. The scheduling of =
GPU threads among grids and blocks is not concerned in our study.
  =

  =3D=3D=3D For Java programmers =3D=3D=3D
- If your MapReduce program is written in Java, you may need [[http://downl=
oad.oracle.com/javase/6/docs/technotes/guides/jni/spec/jniTOC.html|JNI]] to=
 make use of CUDA. However, [[http://www.jcuda.org|JCuda]] provides a easie=
r solution for us. We introduce CUDA to our Map stage. The CUDA code is cal=
led by map() method within Map class. It is easy to extend to Reduce stage =
if necessary. There are two ways to compile your CUDA code.
+ If your MapReduce program is written in Java, you may need [[http://downl=
oad.oracle.com/javase/6/docs/technotes/guides/jni/spec/jniTOC.html|JNI]] to=
 make use of CUDA. However, [[http://www.jcuda.org|JCuda]] provides an easy=
 solution for us. We introduce CUDA to our Map stage. The CUDA code is call=
ed by map() method within Map class. It is easy to extend to Reduce stage i=
f necessary. There are two ways to compile your CUDA code.
  =

  One is to write CUDA code as a String variable in your Java code. JCuda w=
ill automatically compile it for you. The compiled binary file is located i=
n tasktrackers working directory that you can configure in mapred-site.xml =
file.
  =

  The other is little bit tricky. you can manually compile the CUDA code in=
to binary files in advance and move them to tasktrackers working directory.=
 And then every tasktracker can access those compiled binary files.
  =

  =3D=3D=3D For C/C++ programmers =3D=3D=3D
- We employ CUDA SDK programs in our experiments. For CUDA SDK programs, we=
 first digested the code and partitioned the program into portions for data=
 generation, bootstrapping, and CUDA kernels, with the former two component=
s transformed respectively into a standalone data generator and a virtual m=
ethod callable from the map method in our MapRed utility class. The CUDA ke=
rnel is kept as-is since we want to perform the same computation on the GPU=
 only in a distributed fashion. The data generator is augmented with the fe=
ature for taking command-line arguments such that we can specify input size=
s and output location for different experiment runs. We reuse the code for =
boot-strapping a kernel execution into part of the mapper workload, thus pr=
oviding a seamless integration of CUDA and Hadoop. The architecture of the =
ported CUDA SDK programs onto Hadoop is shown in Figure 1. For reusability,=
 we have used object-oriented design by abstracting the mapper and reducer =
functions into a base class, i.e., MapRed. For different computing, we can =
override the following virtual methods defined by MapRed:
+ We employ CUDA SDK programs in our experiments. For CUDA SDK programs, we=
 first digested the code and partitioned the program into portions for data=
 generation, bootstrapping, and CUDA kernels, with the former two component=
s transformed respectively into a standalone data generator and a virtual m=
ethod callable from the map method in our MapRed utility class. The CUDA ke=
rnel is kept as-is since we want to perform the same computation on the GPU=
 only in a distributed fashion. The data generator is augmented with the fe=
ature for taking command-line arguments such that we can specify input size=
s and output location for different experiment runs. We reuse the code for =
boot-strapping a kernel execution into part of the mapper workload, thus pr=
oviding a seamless integration of CUDA and Hadoop. The architecture of the =
ported CUDA SDK programs onto Hadoop is shown in Figure 2. For reusability,=
 we have used object-oriented design by abstracting the mapper and reducer =
functions into a base class, i.e., MapRed. For different computing, we can =
override the following virtual methods defined by MapRed:
  =

- [[http://cse.unl.edu/~che/images/streaming-2.bmp|Figure 1]]
+ [[http://cse.unl.edu/~che/images/streaming-2.bmp|Figure 2]]
  =

  {{{
   void processHadoopData(string& input);