hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bejoy...@yahoo.com
Subject Re: Getting Slow Query Performance!
Date Tue, 12 Mar 2013 11:52:29 GMT
Hi

Since you are on a pseudo distributed/ single node environment the hadoop mapreduce parallelism
is limited.

You might be having just a few map slots and map tasks might be in queue waiting for others
to complete. In a larger cluster your job should be faster.

Certain SQL queries that ulilize indexing would be faster in sql server than in hive.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Gobinda Paul <gobinda@live.com>
Date: Tue, 12 Mar 2013 15:09:31 
To: user@hive.apache.org<user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Getting Slow Query Performance!






i use sqoop to import 30GB data ( two table employee(aprox 21 GB)  and salary(aprox 9GB )
into hadoop(Single Node) via hive.
i run a sample query like SELECT EMPLOYEE.ID,EMPLOYEE.NAME,EMPLOYEE.DEPT,SALARY.AMOUNT FROM
EMPLOYEE JOIN SALARY WHERE EMPLOYEE.ID=SALARY.EMPLOYEE_ID AND SALARY.AMOUNT>900000;
In Hive it's take 15 Min(aprox.) where as mySQL take 4.5 min( aprox ) to execute that query
.
CPU: Pentium(R) Dual-Core  CPU      E5700  @ 3.00GHzRAM:  2GBHDD: 500GB

Here IS My hive-site.xml conf.

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>  <property>    <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
 </property>  <property>    <name>javax.jdo.option.ConnectionDriverName</name>
   <value>com.mysql.jdbc.Driver</value>  </property>  <property> 
  <name>javax.jdo.option.ConnectionUserName</name>    <value>root</value>
 </property>  <property>    <name>javax.jdo.option.ConnectionPassword</name>
   <value>123456</value>  </property>  <property>    <name>hive.hwi.listen.host</name>
    <value>0.0.0.0</value>     <description>This is the host address the
Hive Web Interface will listen on</description>  </property>  <property>
   <name>hive.hwi.listen.port</name>    <value>9999</value>    <description>This
is the port the Hive Web Interface will listen on</description>   </property>
  <property>    <name>hive.hwi.war.file</name>    <value>/lib/hive-hwi-0.9.0.war</value>
   <description>This is the WAR file with the jsp content for Hive Web Interface</description>
  </property>
  <property>  <name>mapred.reduce.tasks</name>    <value>-1</value>
<description>The default number of reduce tasks per job.  Typically set	to a prime close
to the number of available hosts.  Ignored when	mapred.job.tracker is "local". Hadoop set
this to 1 by default, whereas hive uses -1 as its default value.	By setting this property
to -1, Hive will automatically figure out what should be the number of reducers.	</description>
  </property>
   <property>     <name>hive.exec.reducers.bytes.per.reducer</name>    
<value>1000000000</value>     <description>size per reducer.The default
is 1G, i.e if the input size is 10G, it will use 10 reducers.</description>   </property>

  <property>    <name>hive.exec.reducers.max</name>    <value>999</value>
       <description>max number of reducers will be used. If the one      	specified
in the configuration parameter mapred.reduce.tasks is      	negative, hive will use this one
as the max number of reducers when      	automatically determine number of reducers.     
	</description>   </property>
  <property>    <name>hive.exec.scratchdir</name>    <value>/tmp/hive-${user.name}</value>
   <description>Scratch space for Hive jobs</description>  </property>
   <property>     <name>hive.metastore.local</name>     <value>true</value>
  </property>
</configuration>

Any IDEA ?? 		 	   		   		 	   		  
Mime
View raw message