Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 49EA1102B5 for ; Mon, 13 Jan 2014 07:31:28 +0000 (UTC) Received: (qmail 16582 invoked by uid 500); 13 Jan 2014 07:31:17 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 16471 invoked by uid 500); 13 Jan 2014 07:31:09 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 16417 invoked by uid 99); 13 Jan 2014 07:31:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jan 2014 07:31:04 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sanjay.subramanian@roberthalf.com designates 207.46.163.27 as permitted sender) Received: from [207.46.163.27] (HELO co9outboundpool.messaging.microsoft.com) (207.46.163.27) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jan 2014 07:30:58 +0000 Received: from mail176-co9-R.bigfish.com (10.236.132.228) by CO9EHSOBE022.bigfish.com (10.236.130.85) with Microsoft SMTP Server id 14.1.225.22; Mon, 13 Jan 2014 07:30:36 +0000 Received: from mail176-co9 (localhost [127.0.0.1]) by mail176-co9-R.bigfish.com (Postfix) with ESMTP id A50383801F6; Mon, 13 Jan 2014 07:30:36 +0000 (UTC) X-Forefront-Antispam-Report: CIP:204.75.127.192;KIP:(null);UIP:(null);IPV:NLI;H:N1-1EXC-BM09.corp.rhalf.com;RD:mailout-hide.rhi.com;EFVD:NLI X-SpamScore: -10 X-BigFish: VPS-10(zz9371Ic85dh14e3Mf10Id799h328cMzz1f42h208ch1ee6h1de0h1fdah2073h2146h1202h1e76h2189h1d1ah1d2ah1fc6hzz1de098h177df4h29e4I17326ah8275bh8275dh18c673h1de097h186068h164c38jz2ei109h2a8h839hd25hf0ah1288h12a5h12bdh137ah1441h1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h1b0ah1bceh224fh1d0ch1d2eh1d3fh1dfeh1dffh1e1dh1fe8h1ff5h20f0h2152h2216h22d0h2336h2438h2461h1155h) Received-SPF: neutral (mail176-co9: 204.75.127.192 is neither permitted nor denied by domain of roberthalf.com) client-ip=204.75.127.192; envelope-from=sanjay.subramanian@roberthalf.com; helo=N1-1EXC-BM09.corp.rhalf.com ;rp.rhalf.com ; Received: from mail176-co9 (localhost.localdomain [127.0.0.1]) by mail176-co9 (MessageSwitch) id 1389598234487298_31056; Mon, 13 Jan 2014 07:30:34 +0000 (UTC) Received: from CO9EHSMHS009.bigfish.com (unknown [10.236.132.228]) by mail176-co9.bigfish.com (Postfix) with ESMTP id 710D1C80045; Mon, 13 Jan 2014 07:30:34 +0000 (UTC) Received: from N1-1EXC-BM09.corp.rhalf.com (204.75.127.192) by CO9EHSMHS009.bigfish.com (10.236.130.19) with Microsoft SMTP Server (TLS) id 14.16.227.3; Mon, 13 Jan 2014 07:30:34 +0000 X-AuditID: 0af9124f-f79796d000000bb9-b5-52d39618bff5 Received: from N1-1EXC-CAS06.na.msds.rhi.com (Unknown_Domain [10.246.225.11]) by N1-1EXC-BM09.corp.rhalf.com (Symantec Messaging Gateway) with SMTP id CB.8C.03001.81693D25; Sun, 12 Jan 2014 23:30:33 -0800 (PST) Received: from N1-1EXC-MBX06N2.na.msds.rhi.com ([fe80::7886:d17e:fa83:bcd0]) by N1-1EXC-CAS06.na.msds.rhi.com ([fe80::51f9:a5e:4da6:3610%10]) with mapi id 14.03.0158.001; Sun, 12 Jan 2014 23:30:32 -0800 From: "Subramanian, Sanjay (HQP)" To: "user@hive.apache.org" , "Prashant Kumar - ERS, HCL Tech" Subject: RE: One information about the Hive Thread-Topic: One information about the Hive Thread-Index: AQHPECvGslX7bGWoAUuecfJclKd4c5qCOzwy Date: Mon, 13 Jan 2014 07:30:31 +0000 Message-ID: <834F4429F30C00488688F3488A114C5AC7F42D@N1-1EXC-MBX06N2.na.msds.rhi.com> References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.10.140] Content-Type: multipart/alternative; boundary="_000_834F4429F30C00488688F3488A114C5AC7F42DN11EXCMBX06N2nams_" MIME-Version: 1.0 X-CFilter-Loop: Proxied X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrCIsWRmVeSWpSXmKPExsXC9e0ht67ktMtBBoc3C1tcOLKc2aK19xmb A5PHhrunGT2m/17EGMAUxWWTkpqTWZZapG+XwJUx4/VL5oLL2xgr/h99w9bAeHEuYxcjB4eE gInE7B+GXYycQKaYxIV769m6GLk4hASuMEp0nnnEDpIQEjjHKPH7cCyIzSbgKvGmewsriC0i kC1x9swEsBphAW2JLZd3QsV1JG5Ma4CyjSRmflnEDLKLRUBV4uoNI5Awr0CIxKQ5p5khdq1n kuid+AVsDqdAoMS5d5dYQGxGoIO+n1rDBGIzC4hL3HoynwniUAGJJXvOM0PYohIvH/9jhbCV JN52TmIC2cUskC+x5ksExC5BiZMzn7BAvCsusXhP1gRG0VlIhs5CaJiFpAGiRE/ixtQpbBC2 tsSyha+ZIWxdiRn/DrEgiy9gZF/FKO1nqGvoGuGs6+RrYKmXnF9UoFeUkZiTBmTmbmIExd1P If8djI826R9iFOBgVOLhbdC5HCTEmlhWXJl7iFGCg1lJhPe4D1CINyWxsiq1KD++qDQntfgQ ozQHi5I4r171mSAhgfTEktTs1NSC1CKYLBMHp1QDo/3Xg9erFjMy31UvU+NdmlV1q6NsVvHr JQoHd/x5lu3BWWcqaWwQ4Xv6jPmjhoaX0ZUurlmqUf9+TxS/I7TmyPtNdp/SK6+tao60Phcy bxKnF/9W3c/PJdvv9j0tnDKRqeNjikOg4YmoRxd5K4PvnDlk8tvmiOpiXpdbJ8K3y/Svqojb tPXKHCWW4oxEQy3mouJEAHxpzm+3AgAA X-OriginatorOrg: roberthalf.com X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% X-Virus-Checked: Checked by ClamAV on apache.org --_000_834F4429F30C00488688F3488A114C5AC7F42DN11EXCMBX06N2nams_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Vikas Welcome to the world of Hive ! The first book u should read is by Capriolo , Wampler, Rutherglen Programming Hive http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335 This is a must read. I have immensely benefited from this book and the hive= user group (the group is kickass). If u r not sure of the details of HDFS/Hadoop then the Hadoop Definitive Gu= ide (Tom White) is a must read. My view would be u should know both very well eventually... I have setup Hadoop and Hive cluster in three ways [1] manually thru tarballs (lightweight but u need to know what u r install= ing and where) [2] CDH & Cloudera manager (heavyweight but it does things in the backgroun= d....easy to install and quick to setup on a sandbox and learn)...Plus Bees= wax is s great starter UI for Hive queries [3] Using Amazon EMR Hive (I realize this is the easiest and the fastest to= setup to learn Hive) My suggestion , Don't go for option [1] - u learn a lot there but it could = take time and u might feel frustrated as well using option [2] above , then I suggest - 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with 16-3= 2GB RAM - download and install Cloudera manager If u don't have access to box(es) to install hadoop/hive then the cheapest = way to learn is by using Amazon EMR - First create a S3 bucket and a folder to store a data file called songs.t= xt 1,2,lennon,john,nowhere man 1,3,lennon,john,strawberry fields forever 2,1,mccartney,paul,penny lane 2,2,mccartney,paul,michelle 2,3,mccartney,paul,yesterday 3,1,harrison,george,while my guitar gently weeps 3,2,harrison,george,i want to tell you 3,3,harrison,george,think for yourself 3,4,harrison,george,something 4,1,starr,ringo,octopuss garden 4,2,starr,ringo,with a liitle help from my friends - Create a key pair from the AWS console and save the private key on your l= ocal desktop - Create a EMR cluster with Hive installed - ssh -i /path/on/your/desktop/to/amazonkeypair.pem hadoop@.compute.amazonaws.com - One the linux prompt --> hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS songs(id INT, SEQID IN= T, LASTNAME STRING, FIRSTNAME STRING, SONGNAME STRING) ROW FORMAT DELIMITED= FIELDS TERMINATED BY ',' " --> hive -e "select songname from songs where lastname=3D'lennon' OR lastn= ame =3D 'harrison'" Hope this helps Hive on !!! sanjay id,seq,lastname,firstname,songname ________________________________ From: Vikas Parashar [para.vikas@gmail.com] Sent: Sunday, January 12, 2014 10:50 PM To: Prashant Kumar - ERS, HCL Tech Cc: user@hive.apache.org Subject: Re: One information about the Hive Prashant, Actually I just started reading and understanding the Hive. Could you pleas= e tell me how you learnt the Hive, you did any training. Is there any insti= tute which is reliable for specifically Hive Training. I read alots of tut= orial on net, but still not able to co-relate the file which is stored on t= he hadoop cluster and how the hive actually works. The complete end to end = transaction and its storage.Can you take some class on the pay basis and c= lear my question. Pl help me . i have learnt from community and my personal experience. What i can do, i j= ust fwd your request to some known member of Big Data. Note: One imp thing, can I post the question directly to you, if you do not= mind and if I am not disturbing you. Please put all question's on community only. Thanks Prashant From: Vikas Parashar [mailto:para.vikas@gmail.com] Sent: Monday, January 13, 2014 11:07 AM To: user@hive.apache.org Subject: Re: One information about the Hive Prashant, I am new to Hive, I am reading the doc which is available on Apache site an= d try to create a correlation between hadoop and Hive. so please help me to= understand this: As per my understanding, all the files where unstructured data are stored i= n HDFS system across the hadoop cluster. Now when we have to analyze those = data we use Hive. Now I have some question which I am not able to get : 1.When engineer/buisnessuser want to analyze the data, which is available o= n any of the file on HDFS cluster, so what is the steps to get the desired = file and analyze the file using hive. You need to map it with hdfs. With the help of map-reduce, initially you ne= ed to create some meta data in h catalog. May be it will help you..http://hortonworks.com/use-cases/sentiment-analysi= s-hadoop-example/ 2.Is Hive stores all the data in their tables after the analysis permanentl= y? Hive never store any data. 3.Is Hive itself a database? It is just a data-access framework. Thanks Prashant ::DISCLAIMER:: ---------------------------------------------------------------------------= ------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and inte= nded for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as informa= tion could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in trans= mission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability = on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the = author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, disse= mination, copying, disclosure, modification, distribution and / or publication of this message without the prior written= consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please= delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses = and other defects. ---------------------------------------------------------------------------= ------------------------------------------------------------------------- --_000_834F4429F30C00488688F3488A114C5AC7F42DN11EXCMBX06N2nams_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi Vikas

Welcome to the world of Hive !

The first book u should read is by Capriolo , Wampler, Rutherglen
Programming Hive 

This is a must read. I have immensely benefited from this book and the= hive user group (the group is kickass).

If u r not sure of the details of HDFS/Hadoop then the Hadoop Definiti= ve Guide (Tom White) is a must read.
My view would be u should know both very well eventually...

I have setup Hadoop and Hive cluster in three ways 
[1] manually thru tarballs (lightweight but u need to know what u r in= stalling and where)
[2] CDH & Cloudera manager (heavyweight but it does things in the = background....easy to install and quick to setup on a sandbox and learn)...= Plus Beeswax is s great starter UI for Hive queries 
[3] Using Amazon EMR Hive (I realize this is the easiest and the faste= st to setup to learn Hive)

My suggestion , Don't go for option [1] - u learn a lot there but it c= ould take time and u might feel frustrated as well
 
using option [2] above , then I suggest 
- 1 or 2 boxes - i7 quad core (or u c= an use a 8 core AMD FX 8300) with 1= 6-32GB RAM
- download and install Cloudera manag= er 

If u don't have access to box(es) to = install hadoop/hive then the cheapest way  to learn is by using Amazon= EMR
- First create a S3 bucket and a fold= er to store a data file called songs.txt

  1,2,lennon,john,nowhere man
  1,3,lennon,john,strawberry fie= lds forever
  2,1,mccartney,paul,penny lane<= /span>
  2,2,mccartney,paul,michelle
  2,3,mccartney,paul,yesterday
  3,1,harrison,george,while my guitar gently weeps
  3,2,harrison,george,i want to tell you
  3,3,harrison,george,think for yourself
  3,4,harrison,george,something
  4,1,starr,ringo,octopuss garden
  4,2,starr,ringo,with a liitle help from my friends

- Create a key pair from the AWS cons= ole and save the private key on your local desktop

- Create a EMR cluster with Hive inst= alled

- ssh -i /path/on/your/desktop/to/amazonkeypair.pem   hadoop= @<some-ec2-instance-name>.compute.amazonaws.com

- One the linux prompt
  -->   hive -e "CREATE EXTERNAL TABLE IF NOT EXI= STS songs(id INT, SEQID INT, LASTNAME STRING, FIRSTNAME STRING, SONGNAME ST= RING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' "
 --> hive -e "select songname from songs where lastname= =3D'lennon' OR lastname =3D 'harrison'"

Hope this helps

Hive on !!!

sanjay

  

   


  id,seq,lastname,firstname,songname
   

 

From: Vikas Parashar [para.vikas@gmail.co= m]
Sent: Sunday, January 12, 2014 10:50 PM
To: Prashant Kumar - ERS, HCL Tech
Cc: user@hive.apache.org
Subject: Re: One information about the Hive

Prashant,


Actually I just started= reading and understanding the Hive. Could you please tell me how you learn= t the Hive, you did any training. Is there any institute which is reliable for specifically Hive  Training. I read alots of tu= torial on net, but still not able to co-relate the file which is stored on = the hadoop cluster and how the hive actually works. The complete end to end= transaction and its storage.Can you take some class on the pay basis  and clear my question. Pl help me .=

 
i have learnt from community and my personal experience. What i can do= , i just fwd your request to some known member of Big Data. 
 

 

Note: One imp thing, ca= n I post the question directly to you, if you do not mind and if I am not d= isturbing you.


Please put all question's on community only.
 

 

Thanks

Prashant<= /span>

 

From: Vika= s Parashar [mailto:para.vikas@gmail.com]
Sent: Monday, January 13, 2014 11:07 AM
To: user@h= ive.apache.org
Subject: Re: One information about the Hive

 

Prashant,

 

 

I am new to Hive, I am reading the d= oc which is available on Apache site and try to create a correlation betwee= n hadoop and Hive. so please help me to understand this:

As per my understanding, all the fil= es where unstructured data are stored in HDFS system across the hadoop clus= ter. Now when we have to analyze those data we use Hive.

Now I have some question which I am = not able to get :

 

1.When engineer/buisnessuser want to= analyze the data, which is available on any of the file on HDFS cluster, s= o what is the steps to get the desired file and analyze the file using hive.

 

You need to map it with hdfs. With the help of map-r= educe, initially you need to create some meta data in h catalog.  <= /u>

 

 

2.Is Hive stores all the data in the= ir tables after the analysis permanently?

 

Hive never store any data.

 

3.Is Hive itself a database?<= u>

 

It is just a data-access framework.

 

 

 

 

Thanks

Prashant



::DISCLAIMER::
---------------------------------------------------------------------------= -------------------------------------------------------------------------

The contents of this e-mail a= nd any attachment(s) are confidential and intended for the named recipient(= s) only.
E-mail transmission is not guaranteed to be secure or error-free as informa= tion could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in trans= mission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability = on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the = author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, disse= mination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written= consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please= delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses = and other defects.

-----------------------------= ---------------------------------------------------------------------------= --------------------------------------------

 


--_000_834F4429F30C00488688F3488A114C5AC7F42DN11EXCMBX06N2nams_--