Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9643E5FB for ; Tue, 5 Feb 2013 18:12:44 +0000 (UTC) Received: (qmail 84247 invoked by uid 500); 5 Feb 2013 18:12:39 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 84159 invoked by uid 500); 5 Feb 2013 18:12:39 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 84148 invoked by uid 99); 5 Feb 2013 18:12:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 18:12:39 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of richardpickens02@gmail.com designates 209.85.223.196 as permitted sender) Received: from [209.85.223.196] (HELO mail-ie0-f196.google.com) (209.85.223.196) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 18:12:34 +0000 Received: by mail-ie0-f196.google.com with SMTP id c10so216662ieb.11 for ; Tue, 05 Feb 2013 10:12:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=svAxNnDM5MJAsgEPwBywk9GuXCj5uiaciZWoVDMU4C8=; b=JHsMOJm322TxFEhOhuOJeti0OnnQRAx0U054oinr1lHFjFLqYM9RZ9s3v/Uxzv91sd w8cqugtdcp2ih/6A7lHSs3kVtisi6pW5rkZK2kPS60sRKmZFinUU4yLett9TfcBlthxS TthJS7LtQwyEVV+0xW6xi//54GpBhhsGLRYEizT0MZH/f8v4+lVk8iDO/LQzL+U4iSiw AkmAg/B7GOCAnyNsSxIFaBFzhn5qOLPlSeoxZmbYG/8fv03Xr7tkiV3NafWP471x6Oxv 6CuZuX6rcZifZGZaBz+hZqu8K3jpyI8YSV8fVzcFX2ZP9KlE16bUET+GhLaRywTDSifO r+vQ== MIME-Version: 1.0 X-Received: by 10.50.190.231 with SMTP id gt7mr311785igc.85.1360087934121; Tue, 05 Feb 2013 10:12:14 -0800 (PST) Received: by 10.64.51.71 with HTTP; Tue, 5 Feb 2013 10:12:14 -0800 (PST) In-Reply-To: <2AAF1ECF-441B-4E13-8DBE-017F9997DF39@queryio.com> References: <2AAF1ECF-441B-4E13-8DBE-017F9997DF39@queryio.com> Date: Tue, 5 Feb 2013 10:12:14 -0800 Message-ID: Subject: Re: Application of Cloudera Hadoop for Dataset analysis From: Richard Pickens To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae9340d152d355c04d4fe260b X-Virus-Checked: Checked by ClamAV on apache.org --14dae9340d152d355c04d4fe260b Content-Type: text/plain; charset=ISO-8859-1 You can use Hortonworks data platform which already integrates HDFS, MapReduce and Hive well. http://hortonworks.com/products/hortonworksdataplatform/ Came across this new solution recently, They claim to be Hadoop based Standard SQL solution for data analytics. http://queryio.com/hadoop-big-data-product/hadoop-hive.html Have not given it a try yet but you can explore it. -Richard On Tue, Feb 5, 2013 at 10:07 AM, * *Preethi Vinayak Ponangi < vinayakponangi@gmail.com> wrote: > *From: *Preethi Vinayak Ponangi > *Subject: **Re: Application of Cloudera Hadoop for Dataset analysis* > *Date: *February 5, 2013 8:07:47 AM PST > *To: *user@hadoop.apache.org > *Reply-To: *user@hadoop.apache.org > > It depends on what part of the Hadoop Eco system component you would like > to use. > > You can do it in several ways: > > 1) You could write a basic map reduce job to do joins. > This link could help or just a basic search on google would give you > several links. > > http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/ > > 2) You could use an abstract language like Pig to do these joins using > simple pig scripts. > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html > > 3) The simplest of all, you could write SQL like queries to do this join > using Hive. > http://hive.apache.org/ > > Hope this helps. > > Regards, > Vinayak. > > > On Tue, Feb 5, 2013 at 10:00 AM, Suresh Srinivas wrote: > >> Please take this thread to CDH mailing list. >> >> >> On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku < >> sharathchandra92@gmail.com> wrote: >> >>> Hi, >>> >>> I am Sharath Chandra, an undergraduate student at BITS-Pilani, India. I >>> would like to get the following clarifications regarding cloudera hadoop >>> distribution. I am using a CDH4 Demo VM for now. >>> >>> 1. After I upload the files into the file browser, if I have to link >>> two-three datasets using a key in those files, what should I do? Do I have >>> to run a query over them? >>> >>> 2. My objective is that I have some data collected over a few years and >>> now, I would like to link all of them, as in a database using keys and then >>> run queries over them to find out particular patterns. Later I would like >>> to implement some Machine learning algorithms on them for predictive >>> analysis. Will this be possible on the demo VM? >>> >>> I am totally new to this. Can I get some help on this? I would be very >>> grateful for the same. >>> >>> >>> ------------------------------------------------------------------------------ >>> Thanks and Regards, >>> *Sharath Chandra Guntuku* >>> Undergraduate Student (Final Year) >>> *Computer Science Department* >>> *Email*: f2009149@hyderabad.bits-pilani.ac.in >>> >>> *BITS-Pilani*, Hyderabad Campus >>> Jawahar Nagar, Shameerpet, RR Dist, >>> Hyderabad - 500078, Andhra Pradesh >>> >> >> >> >> -- >> http://hortonworks.com/download/ >> > > > --14dae9340d152d355c04d4fe260b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

You can use Hortonworks dat= a platform which already integrates HDFS, MapReduce and Hive well.
http://hortonworks.com/products/hortonworksdataplatform/<= /a>


Have = not given it a try yet but you can explore it.

-Richard

=A0On Tue, Feb 5, 2013 at 10:07 AM,=A0=A0Preethi Vinaya= k Ponangi <vinayakponangi@gmail.com>=A0wrote:
From: Preethi Vinayak Ponangi <vinayakponangi@gmail.com>
Subject: Re: Application of Cloudera Hadoo= p for Dataset analysis
Date: February 5, 2013 8:07:47 AM PST

It depends on what part of the Hadoop Eco system component= you would like to use.

You can do it in several ways:

1) You= could write a basic map reduce job to do joins.
This link could help or= just a basic search on google would give you several links.

http://chamibuddhika.wordpress.com/2012/02/26/j= oins-with-map-reduce/

2) You could use an abstract language like= Pig to do these joins using simple pig scripts.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

3) = The simplest of all, you could write SQL like queries to do this join using= Hive.
http://hive.apache.or= g/

Hope this helps.

Regards,
Vinayak.


On Tue, Feb 5, 2013 at 10:00 AM, Suresh Srinivas <suresh@= hortonworks.com> wrote:
Please take this thread to CDH mailing li= st.


On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku &= lt;sharathc= handra92@gmail.com> wrote:
Hi,

I am Sharath Chandra= , an undergraduate student at BITS-Pilani, India. I would like to get the f= ollowing clarifications regarding cloudera hadoop distribution. I am using = a CDH4 Demo VM for now.

1. After I upload the files into the file browser, if I have to l= ink two-three datasets using a key in those files, what should I do? Do I h= ave to run a query over them?

2. My objective is that I have s= ome data collected over a few years and now, I would like to link all of th= em, as in a database using keys and then run queries over them to find out = particular patterns. Later I would like to implement some Machine learning = algorithms on them for predictive analysis. Will this be possible on the de= mo VM?

I am totally new to this. Can I get some help on this? I would be very = grateful for the same.

---------------------------------------------= ---------------------------------
Thanks and Regards,
Sharath Chandra Guntuku
Undergraduate Student (Final Year)
Computer Science Department
=

BITS-Pilani,=A0Hyderabad C= ampus
Jawahar Nagar, Shameerpet, RR Dist,=A0
Hyderabad - 500078, Andhra Pradesh




--
http://hortonworks.com/download/
<= br>


--14dae9340d152d355c04d4fe260b--