hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subramanian, Sanjay (HQP)" <>
Subject RE: One information about the Hive
Date Mon, 13 Jan 2014 07:30:31 GMT
Hi Vikas

Welcome to the world of Hive !

The first book u should read is by Capriolo , Wampler, Rutherglen
Programming Hive

This is a must read. I have immensely benefited from this book and the hive user group (the
group is kickass).

If u r not sure of the details of HDFS/Hadoop then the Hadoop Definitive Guide (Tom White)
is a must read.
My view would be u should know both very well eventually...

I have setup Hadoop and Hive cluster in three ways
[1] manually thru tarballs (lightweight but u need to know what u r installing and where)
[2] CDH & Cloudera manager (heavyweight but it does things in the background....easy to
install and quick to setup on a sandbox and learn)...Plus Beeswax is s great starter UI for
Hive queries
[3] Using Amazon EMR Hive (I realize this is the easiest and the fastest to setup to learn

My suggestion , Don't go for option [1] - u learn a lot there but it could take time and u
might feel frustrated as well

using option [2] above , then I suggest
- 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with 16-32GB RAM
- download and install Cloudera manager

If u don't have access to box(es) to install hadoop/hive then the cheapest way  to learn is
by using Amazon EMR
- First create a S3 bucket and a folder to store a data file called songs.txt

  1,2,lennon,john,nowhere man
  1,3,lennon,john,strawberry fields forever
  2,1,mccartney,paul,penny lane
  3,1,harrison,george,while my guitar gently weeps
  3,2,harrison,george,i want to tell you
  3,3,harrison,george,think for yourself
  4,1,starr,ringo,octopuss garden
  4,2,starr,ringo,with a liitle help from my friends

- Create a key pair from the AWS console and save the private key on your local desktop

- Create a EMR cluster with Hive installed

- ssh -i /path/on/your/desktop/to/amazonkeypair.pem   hadoop@<some-ec2-instance-name>

- One the linux prompt
 --> hive -e "select songname from songs where lastname='lennon' OR lastname = 'harrison'"

Hope this helps

Hive on !!!



From: Vikas Parashar []
Sent: Sunday, January 12, 2014 10:50 PM
To: Prashant Kumar - ERS, HCL Tech
Subject: Re: One information about the Hive


Actually I just started reading and understanding the Hive. Could you please tell me how you
learnt the Hive, you did any training. Is there any institute which is reliable for specifically
Hive  Training. I read alots of tutorial on net, but still not able to co-relate the file
which is stored on the hadoop cluster and how the hive actually works. The complete end to
end transaction and its storage.Can you take some class on the pay basis  and clear my question.
Pl help me .

i have learnt from community and my personal experience. What i can do, i just fwd your request
to some known member of Big Data.

Note: One imp thing, can I post the question directly to you, if you do not mind and if I
am not disturbing you.

Please put all question's on community only.


From: Vikas Parashar [<>]
Sent: Monday, January 13, 2014 11:07 AM
Subject: Re: One information about the Hive


I am new to Hive, I am reading the doc which is available on Apache site and try to create
a correlation between hadoop and Hive. so please help me to understand this:
As per my understanding, all the files where unstructured data are stored in HDFS system across
the hadoop cluster. Now when we have to analyze those data we use Hive.
Now I have some question which I am not able to get :

1.When engineer/buisnessuser want to analyze the data, which is available on any of the file
on HDFS cluster, so what is the steps to get the desired file and analyze the file using hive.

You need to map it with hdfs. With the help of map-reduce, initially you need to create some
meta data in h catalog.

May be it will help you..

2.Is Hive stores all the data in their tables after the analysis permanently?

Hive never store any data.

3.Is Hive itself a database?

It is just a data-access framework.


The contents of this e-mail and any attachment(s) are confidential and intended for the named
recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e
mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator
or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may
not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying,
disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized
representative of
HCL is strictly prohibited. If you have received this email in error please delete it and
notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.

View raw message