hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <>
Subject [jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?
Date Tue, 15 Dec 2015 22:44:46 GMT


Gopal V commented on HIVE-12683:

Pretty sure all of that math is for 10k RPM disks - SSDs don't exactly follow the same rules.

For r3.8xl, my recommendation from measurement was 24 containers x 8 Gb containers for optimum

EMR's default configs for Hive might not be the best for Tez, you might want to reconfigure
hive-site.xml based on the Ambari default install instead.

Something like your Test 2 might be OOM'ing due to lack of S3 file closures - instead of increasing
the memory Xmx so high, you might want to turn on the scalable partitioned insert from HIVE-6455.

Most of what you describe here isn't necessarily a bug.

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> ----------------------------------------------------------
>                 Key: HIVE-12683
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: rohit garg
> We have started to look into testing tez query engine. From initial results, we are getting
30% performance boost over Hive on smaller data set(1-10 GB) but Hive starts to perform better
than Tez as data size increases. Like when we run a hive query with Tez on about 2.3 TB worth
of data, it performs worse than hive alone.(~20% less performance) Details are in the post
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=10000; set; set
=-Xmx47364m; set hive.tez.container.size=59205; set; set;
> Is it normal or I am missing some property / not configuring some property properly?
Also, I am using an older version of Tez as of now. Could that be the issue too? I still have
to bootstrap latest version of Tez on EMR and test it and see if that could do any better.
> Thought of asking here too

This message was sent by Atlassian JIRA

View raw message