ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alena Melnikova <al...@74.ru>
Subject Re: HDP, Hive + Ignite
Date Tue, 02 May 2017 14:45:02 GMT
Hi Ivan,

I have some progress)

*1. TEZ on Ignite (with IGFS, without Ignite MR)*
I could run Hive queries on TEZ and Ignite with next settings:
$IGNITE_HOME/bin/ignite.sh -v -J"-Xms10g -Xmx10g -XX:MaxMetaspaceSize=4g"
(every server has RAM 16Gb )
beeline  --hiveconf fs.default.name=igfs://dev-dn1:10500 --hiveconf
ignite.job.shared.classloader=false
set tez.use.cluster.hadoop-libs = true; (to avoid
"java.lang.ClassNotFoundException: Class
org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem not found")
ignite.job.shared.classloader = false; 
hive.rpc.query.plan = true;
hive.execution.engine = tez;
select calday, count(*) from price.toprice where calday between '2017-03-01'
and '2017-03-21' group by calday order by calday;

I run this query 8 times on TEZ+Ingnite and 8 times just on TEZ (without
IGFS), threw out the best and worst result and calculated average.
Results are:
Average execution time TEZ+Ignite: 25 sec
Average execution time just TEZ: 23 sec

Then I run more complex analytical query with joins on the same conditions.
Results are:
Average execution time TEZ+Ignite: 312 sec
Average execution time just TEZ: 313 sec

Results are mostly identical, so I guess IGFS is not used. 
May be I should explicitly tell Hive to cache data in IGFS?
Is there any way to understand that Ignite is used besides measuring
execution time?


*2. Ignite MR (with IGFS, with Ignite MR)*
I could run Hive queries on Ignite MR with next settings: 
$IGNITE_HOME/bin/ignite.sh -v -J"-Xms10g -Xmx10g -XX:MaxMetaspaceSize=4g"
(every server has RAM 16Gb )
beeline  --hiveconf fs.default.name=igfs://dev-dn1:10500 --hiveconf
ignite.job.shared.classloader=false
ignite.job.shared.classloader = false; 
mapreduce.jobtracker.address=dev-dn1.co.vectis.local:11211;
hive.rpc.query.plan = true;
hive.execution.engine = mr;
select calday, count(*) from price.toprice where calday between '2017-03-01'
and '2017-03-21' group by calday order by calday;

If I use one ignite node it returns correct answer but much slower - 80 sec
vs 23 sec on TEZ.
If I run this query on two or more nodes then result is not correct. As I
can see there are no any errors in logs.
What is wrong?
ignite-node-dn1.log
<http://apache-ignite-users.70518.x6.nabble.com/file/n12344/ignite-node-dn1.log>  
ignite-node-dn2.log
<http://apache-ignite-users.70518.x6.nabble.com/file/n12344/ignite-node-dn2.log>  

3. When I start ignite nodes on different servers sometimes they do not see
each other. I have to rerun a node a few times, after that they connect in
one cluster. Is it normal?




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/HDP-Hive-Ignite-tp12195p12344.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Mime
View raw message