Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of vinayakponangi@gmail.com
 designates 74.125.82.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADdVdVGUVYiF40xxzz+KAyFsPcBBB_W-DEvpchYd_zq_RqEvoA@mail.gmail.com>
References: 
 <CACJRNN7XnkU=GvhDVhMcBK5VPTdk97y_uxNUeAqRap3h8jHEgQ@mail.gmail.com>
 <CADdVdVGUVYiF40xxzz+KAyFsPcBBB_W-DEvpchYd_zq_RqEvoA@mail.gmail.com>
From: Preethi Vinayak Ponangi <vinayakponangi@gmail.com>
Date: Tue, 5 Feb 2013 10:07:47 -0600
Message-ID: 
 <CAMsUEeS4hrSJALK9DPMre2-x+pmc-brhJifGDn-ycJO7n1F+4w@mail.gmail.com>
Subject: Re: Application of Cloudera Hadoop for Dataset analysis
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e0122896c52cd2104d4fc6a03

--089e0122896c52cd2104d4fc6a03
Content-Type: text/plain; charset=ISO-8859-1

It depends on what part of the Hadoop Eco system component you would like
to use.

You can do it in several ways:

1) You could write a basic map reduce job to do joins.
This link could help or just a basic search on google would give you
several links.

http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/

2) You could use an abstract language like Pig to do these joins using
simple pig scripts.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

3) The simplest of all, you could write SQL like queries to do this join
using Hive.
http://hive.apache.org/

Hope this helps.

Regards,
Vinayak.


On Tue, Feb 5, 2013 at 10:00 AM, Suresh Srinivas <suresh@hortonworks.com>wrote:

> Please take this thread to CDH mailing list.
>
>
> On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku <
> sharathchandra92@gmail.com> wrote:
>
>> Hi,
>>
>> I am Sharath Chandra, an undergraduate student at BITS-Pilani, India. I
>> would like to get the following clarifications regarding cloudera hadoop
>> distribution. I am using a CDH4 Demo VM for now.
>>
>> 1. After I upload the files into the file browser, if I have to link
>> two-three datasets using a key in those files, what should I do? Do I have
>> to run a query over them?
>>
>> 2. My objective is that I have some data collected over a few years and
>> now, I would like to link all of them, as in a database using keys and then
>> run queries over them to find out particular patterns. Later I would like
>> to implement some Machine learning algorithms on them for predictive
>> analysis. Will this be possible on the demo VM?
>>
>> I am totally new to this. Can I get some help on this? I would be very
>> grateful for the same.
>>
>>
>> ------------------------------------------------------------------------------
>> Thanks and Regards,
>> *Sharath Chandra Guntuku*
>> Undergraduate Student (Final Year)
>> *Computer Science Department*
>> *Email*: f2009149@hyderabad.bits-pilani.ac.in
>>
>> *BITS-Pilani*, Hyderabad Campus
>> Jawahar Nagar, Shameerpet, RR Dist,
>> Hyderabad - 500078, Andhra Pradesh
>>
>
>
>
> --
> http://hortonworks.com/download/
>

--089e0122896c52cd2104d4fc6a03
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It depends on what part of the Hadoop Eco system component you would like t=
o use.<br><br>You can do it in several ways:<br><br>1) You could write a ba=
sic map reduce job to do joins.<br>This link could help or just a basic sea=
rch on google would give you several links.<br>

<br><a href=3D"http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map=
-reduce/">http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-redu=
ce/</a><br><br>2) You could use an abstract language like Pig to do these j=
oins using simple pig scripts.<br>

<a href=3D"http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html">http://pig=
.apache.org/docs/r0.7.0/piglatin_ref2.html</a><br><br>3) The simplest of al=
l, you could write SQL like queries to do this join using Hive.<br><a href=
=3D"http://hive.apache.org/">http://hive.apache.org/</a><br>

<br>Hope this helps.<br><br>Regards,<br>Vinayak.<br><br><br><div class=3D"g=
mail_quote">On Tue, Feb 5, 2013 at 10:00 AM, Suresh Srinivas <span dir=3D"l=
tr">&lt;<a href=3D"mailto:suresh@hortonworks.com" target=3D"_blank">suresh@=
hortonworks.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Please take this thread to =
CDH mailing list.</div><div class=3D"gmail_extra"><br><br><div class=3D"gma=
il_quote">

On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku <span dir=3D"ltr">&=
lt;<a href=3D"mailto:sharathchandra92@gmail.com" target=3D"_blank">sharathc=
handra92@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Hi,<br><br>I am S=
harath Chandra, an undergraduate student at BITS-Pilani, India. I would lik=
e to get the following clarifications regarding cloudera hadoop distributio=
n. I am using a CDH4 Demo VM for now.<br>


<br></div>1. After I upload the files into the file browser, if I have to l=
ink two-three datasets using a key in those files, what should I do? Do I h=
ave to run a query over them?<br><br></div>2. My objective is that I have s=
ome data collected over a few years and now, I would like to link all of th=
em, as in a database using keys and then run queries over them to find out =
particular patterns. Later I would like to implement some Machine learning =
algorithms on them for predictive analysis. Will this be possible on the de=
mo VM? <br>


<br>I am totally new to this. Can I get some help on this? I would be very =
grateful for the same.<br><br clear=3D"all"><div><div><div><div><div><div d=
ir=3D"ltr"><div style=3D"color:rgb(80,0,80)">------------------------------=
------------------------------------------------<br>


</div><div style=3D"color:rgb(80,0,80)">Thanks and Regards,<br></div><div s=
tyle=3D"color:rgb(80,0,80)"><b>Sharath Chandra Guntuku</b></div><div style=
=3D"color:rgb(80,0,80)">Undergraduate Student (Final Year)</div><div style=
=3D"color:rgb(80,0,80)">


<b>Computer Science Department</b></div><div style=3D"color:rgb(80,0,80)"><=
b>Email</b>: <a href=3D"mailto:f2009149@hyderabad.bits-pilani.ac.in" target=
=3D"_blank">f2009149@hyderabad.bits-pilani.ac.in</a><br></div><div style=3D=
"color:rgb(80,0,80)">


<img src=3D"" style=3D"font-size:medium;font-family:&#39;Times New Roman=
9;"><br></div><p style=3D"color:rgb(80,0,80)"><b>BITS-Pilani</b>,=A0Hyderab=
ad Campus<br>Jawahar Nagar, Shameerpet, RR Dist,=A0<br>

Hyderabad - 500078, Andhra Pradesh</p></div></div><span class=3D"HOEnZb"><f=
ont color=3D"#888888">
</font></span></div></div></div></div></div><span class=3D"HOEnZb"><font co=
lor=3D"#888888">
</font></span></blockquote></div><span class=3D"HOEnZb"><font color=3D"#888=
888"><br><br clear=3D"all"><div><br></div>-- <br><a> http://hortonworks.com=
/download/</a><br>
</font></span></div>
</blockquote></div><br>

--089e0122896c52cd2104d4fc6a03--