Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E77FEE555 for ; Tue, 5 Feb 2013 16:08:40 +0000 (UTC) Received: (qmail 42380 invoked by uid 500); 5 Feb 2013 16:08:35 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 42165 invoked by uid 500); 5 Feb 2013 16:08:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 42112 invoked by uid 99); 5 Feb 2013 16:08:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 16:08:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vinayakponangi@gmail.com designates 74.125.82.175 as permitted sender) Received: from [74.125.82.175] (HELO mail-we0-f175.google.com) (74.125.82.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 16:08:27 +0000 Received: by mail-we0-f175.google.com with SMTP id x8so274979wey.34 for ; Tue, 05 Feb 2013 08:08:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=sRwka//IdX2ho5ASCGv62dMMt1WJpiu8rULpZ9cKn/g=; b=oQkPCGxODLTVsB0Rb+OeOtXE97Ki6AQ6QZoctZYcFAuM+BJqZ+IlZdP3lqArCQ5m8o +msXeV1yXcD/L4w4U8MHb+UJf0hvHkAItnhp6nCrCOGOckFSUAi/wjDHVo+iMbSYla97 4A+968w0n1d4wIH53dmgxOyfTZPKf3Gm0qfSU3SrDULcSekTunczHPzMZTy4WbZh4LuC lbJkaZD8zCmw+TC1HVXOXdVhzVRQ4+ug5uWR8M8IY0w+7tzoKr1VV1EkmjBWJqoMzcg6 eOdsizhrvZ5zj1op0AEy60Sywt0Y6QVbyTsOc+yYnwv9GSxrAeJpWhAPBAExDil63gl0 6+BA== X-Received: by 10.194.122.98 with SMTP id lr2mr43877101wjb.7.1360080487500; Tue, 05 Feb 2013 08:08:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.180.84.37 with HTTP; Tue, 5 Feb 2013 08:07:47 -0800 (PST) In-Reply-To: References: From: Preethi Vinayak Ponangi Date: Tue, 5 Feb 2013 10:07:47 -0600 Message-ID: Subject: Re: Application of Cloudera Hadoop for Dataset analysis To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0122896c52cd2104d4fc6a03 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122896c52cd2104d4fc6a03 Content-Type: text/plain; charset=ISO-8859-1 It depends on what part of the Hadoop Eco system component you would like to use. You can do it in several ways: 1) You could write a basic map reduce job to do joins. This link could help or just a basic search on google would give you several links. http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/ 2) You could use an abstract language like Pig to do these joins using simple pig scripts. http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html 3) The simplest of all, you could write SQL like queries to do this join using Hive. http://hive.apache.org/ Hope this helps. Regards, Vinayak. On Tue, Feb 5, 2013 at 10:00 AM, Suresh Srinivas wrote: > Please take this thread to CDH mailing list. > > > On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku < > sharathchandra92@gmail.com> wrote: > >> Hi, >> >> I am Sharath Chandra, an undergraduate student at BITS-Pilani, India. I >> would like to get the following clarifications regarding cloudera hadoop >> distribution. I am using a CDH4 Demo VM for now. >> >> 1. After I upload the files into the file browser, if I have to link >> two-three datasets using a key in those files, what should I do? Do I have >> to run a query over them? >> >> 2. My objective is that I have some data collected over a few years and >> now, I would like to link all of them, as in a database using keys and then >> run queries over them to find out particular patterns. Later I would like >> to implement some Machine learning algorithms on them for predictive >> analysis. Will this be possible on the demo VM? >> >> I am totally new to this. Can I get some help on this? I would be very >> grateful for the same. >> >> >> ------------------------------------------------------------------------------ >> Thanks and Regards, >> *Sharath Chandra Guntuku* >> Undergraduate Student (Final Year) >> *Computer Science Department* >> *Email*: f2009149@hyderabad.bits-pilani.ac.in >> >> *BITS-Pilani*, Hyderabad Campus >> Jawahar Nagar, Shameerpet, RR Dist, >> Hyderabad - 500078, Andhra Pradesh >> > > > > -- > http://hortonworks.com/download/ > --089e0122896c52cd2104d4fc6a03 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It depends on what part of the Hadoop Eco system component you would like t= o use.

You can do it in several ways:

1) You could write a ba= sic map reduce job to do joins.
This link could help or just a basic sea= rch on google would give you several links.

http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-redu= ce/

2) You could use an abstract language like Pig to do these j= oins using simple pig scripts.
http://pig= .apache.org/docs/r0.7.0/piglatin_ref2.html

3) The simplest of al= l, you could write SQL like queries to do this join using Hive.
http://hive.apache.org/

Hope this helps.

Regards,
Vinayak.


On Tue, Feb 5, 2013 at 10:00 AM, Suresh Srinivas <suresh@= hortonworks.com> wrote:
Please take this thread to = CDH mailing list.


On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku &= lt;sharathc= handra92@gmail.com> wrote:
Hi,

I am S= harath Chandra, an undergraduate student at BITS-Pilani, India. I would lik= e to get the following clarifications regarding cloudera hadoop distributio= n. I am using a CDH4 Demo VM for now.

1. After I upload the files into the file browser, if I have to l= ink two-three datasets using a key in those files, what should I do? Do I h= ave to run a query over them?

2. My objective is that I have s= ome data collected over a few years and now, I would like to link all of th= em, as in a database using keys and then run queries over them to find out = particular patterns. Later I would like to implement some Machine learning = algorithms on them for predictive analysis. Will this be possible on the de= mo VM?

I am totally new to this. Can I get some help on this? I would be very = grateful for the same.

------------------------------= ------------------------------------------------
Thanks and Regards,
Sharath Chandra Guntuku
Undergraduate Student (Final Year)
Computer Science Department

BITS-Pilani,=A0Hyderab= ad Campus
Jawahar Nagar, Shameerpet, RR Dist,=A0
Hyderabad - 500078, Andhra Pradesh




--
http://hortonworks.com= /download/

--089e0122896c52cd2104d4fc6a03--