hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <ser...@hortonworks.com>
Subject Re: sql mapjoin very slow
Date Fri, 28 Aug 2015 16:55:29 GMT
Can you check if this is actually being used in your case?

From: "r7raul1984@163.com<mailto:r7raul1984@163.com>" <r7raul1984@163.com<mailto:r7raul1984@163.com>>
Reply-To: user <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Friday, August 28, 2015 at 00:53
To: user <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: Re: sql mapjoin very slow

I found a method in HashMapWrapper class .  I think hive will use statistics  to adjust threshold
automatically.
public static int calculateTableSize(
float keyCountAdj, int threshold, float loadFactor, long keyCount) {
if (keyCount >= 0 && keyCountAdj != 0) {
// We have statistics for the table. Size appropriately.
threshold = (int)Math.ceil(keyCount / (keyCountAdj * loadFactor));
}
LOG.info("Key count from statistics is " + keyCount + "; setting map size to " + threshold);
return threshold;
}
I have a question. I use hive 1.1.0 ,so hive.stats.dbclass default value is fs. Mean store
statistics in local filesystem.  Any one can tell what is the  file path to store statistics
?

________________________________
r7raul1984@163.com<mailto:r7raul1984@163.com>

From: r7raul1984@163.com<mailto:r7raul1984@163.com>
Date: 2015-08-28 13:03
To: user<mailto:user@hive.apache.org>
Subject: Re: Re: sql mapjoin very slow
I increase hive.hashtable.initialCapacity to 1000000 and decrease hive.hashtable.loadfactor
to 0.5  .  The query run faster.

________________________________
r7raul1984@163.com<mailto:r7raul1984@163.com>

From: Sergey Shelukhin<mailto:sergey@hortonworks.com>
Date: 2015-08-28 09:56
To: user<mailto:user@hive.apache.org>
Subject: Re: sql mapjoin very slow
Is the small-side table large, does it have a lot of rows for the same keys, or does it have
a lot of skew?
Are there lots of misses (where there’d be no value in the small table for the large table
value)?

If you have enough memory you can try increasing initial size and decreasing load factor.
Although without low-level debugging it’s hard to tell if the issue is not obvious (I.e
the above).
If there’s no obvious problem you might consider not using map join.


From: "r7raul1984@163.com<mailto:r7raul1984@163.com>" <r7raul1984@163.com<mailto:r7raul1984@163.com>>
Reply-To: user <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, August 27, 2015 at 18:51
To: user <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: Re: sql mapjoin very slow

I use MR.
My mapjoin config as showed in follow picture:
[cid:_Foxmail.1@fc8fdd99-c32a-61d6-2d6d-06e990b5a482]
[cid:_Foxmail.1@dc2474ea-3f79-4654-d985-f29b19ae8a3e]

________________________________
r7raul1984@163.com<mailto:r7raul1984@163.com>

From: Sergey Shelukhin<mailto:sergey@hortonworks.com>
Date: 2015-08-28 09:21
To: user<mailto:user@hive.apache.org>
Subject: Re: sql mapjoin very slow
Are you using MR and Tez? You could try optimized hash table in case of Tez, although it’s
supposed to improve memory, not necessarily perf.

Can you also share characteristics of the query and data? It is surprising to see so much
time for HashMap.get.

From: "r7raul1984@163.com<mailto:r7raul1984@163.com>" <r7raul1984@163.com<mailto:r7raul1984@163.com>>
Reply-To: user <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, August 27, 2015 at 18:03
To: user <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: sql mapjoin very slow


When I enable mapjoin ,I see Mapjoin task run very slow. My envrioment is hadoop 2.3.0 hive
1.1.0.

My attach is  one map hive log and this map's xprof log.

In map xprof log ,I see
Compiled + native Method
92.3% 643527 + 0 java.util.HashMap.get
2.8% 19856 + 0 java.util.HashMap.put
1.2% 8623 + 0 org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper$GetAdaptor.setFromRow
0.1% 953 + 0 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate
0.1% 576 + 0 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject

________________________________
r7raul1984@163.com<mailto:r7raul1984@163.com>
Mime
View raw message