hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bennie Schut <bsc...@ebuddy.com>
Subject RE: Getting Slow Query Performance!
Date Tue, 12 Mar 2013 11:30:59 GMT
Well it's probably worth  to know 30G is really hitting rock bottom when you talk about big
data. Hadoop is linearly scalable so probably going to 3 or 4 similar machines could get you
below the mysql time but it's hardly a fair comparison.
Setting it up I would suggest reading the hadoop docs: http://hadoop.apache.org/docs/current/
These hardware specs give you an idea why it's an unusual case: http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hardware/

To give you some hints. Each node needs to be configure on how much resources it's allowed
to take. This is a balance between several parameters:
mapred.tasktracker.map.tasks.maximum, mapred.tasktracker.reduce.tasks.maximum, mapred.child.java.opts
There are tons more configurations but this is where you start. Different hardware and different
jobs require different configurations so try it out.
Since you are extremely tight on ram you probably want to reduce memory usage on most processes
like the namenode/jobtracker/hive and on each node drop the memory requirements for tasktracker/datanode.
Also don't put your nodes on 100MB links they are almost always to slow.

Bennie.

From: Gobinda Paul [mailto:gobinda@live.com]
Sent: Tuesday, March 12, 2013 11:01 AM
To: user@hive.apache.org
Subject: RE: Getting Slow Query Performance!


Thnx for your reply , i am new to hadoop and hive .My goal is to process a big data using
hadoop,
this is my university project ( Data Mining ) ,need to show that hadoop is better than mysql
in case
of Big data(30-100GB+) Processing,i know hadoop does that.To do so,can you please suggest
me,
how many node is required to show the performance  and what type of configuration is required
for each node.


From: bschut@ebuddy.com<mailto:bschut@ebuddy.com>
To: user@hive.apache.org<mailto:user@hive.apache.org>
CC: gobinda@live.com<mailto:gobinda@live.com>
Date: Tue, 12 Mar 2013 10:40:33 +0100
Subject: RE: Getting Slow Query Performance!
Generally a single hadoop machine will perform worse then a single mysql machine. People normally
use hadoop when they have so much data it won't really fit on a single machine and it would
require specialized hardware (Stuff like SAN's) to run.
30GB of data really isn't that much and 2GB of ram is really not what hadoop is designed to
work on. It really likes to have lots of memory.
I also don't see the hadoop configuration files so perhaps you only have 1 mapper and 1 reducer.
But this is not a typical use-case so I doubt you'll see snappy performance after tweaking
the configs.



Mime
View raw message