hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "r7raul1984@163.com" <r7raul1...@163.com>
Subject Re: Re: sql mapjoin very slow
Date Mon, 31 Aug 2015 02:46:06 GMT

Yes ,  I am accidentally joining on a Double. 
keys:
                  0 UDFToDouble(nav_tcdt) (type: double)
                  1 UDFToDouble(site_categ_id) (type: double)
                  2 UDFToDouble(site_categ_id) (type: double)
                  3 UDFToDouble(mg_brand_id) (type: double)
                  4 UDFToDouble(attr_detl_id) (type: double)


r7raul1984@163.com
 
From: Gopal Vijayaraghavan
Date: 2015-08-29 01:45
To: user
Subject: Re: sql mapjoin very slow
> I have a question. I use hive 1.1.0 ,so hive.stats.dbclass default value
>is fs. Mean store statistics
> in local filesystem.  Any one can tell what is the  file path to store
>statistics ?
 
The statistics aren't stored in the file system long term - the final
destination for stats is the metastore.
 
The earlier default stats implementation used MR Counters. With
stats.dbclass=fs, they're passed during ETL via the FileSystem, not the MR
counters.
 
You'll see something like this in the ETL phase, which is just a way to
write the target table + a new location where stats for the insert is
staged.
 
2015-08-28T01:44:35,581 INFO  [main]: parse.SemanticAnalyzer
(SemanticAnalyzer.java:genFileSinkPlan(6629)) - Set stats collection dir :
hdfs://
 
The StatsTask on the client side will read this file and update the
metastore.
 
That aside, you might want to check if you're accidentally joining on a
Double. That has been recently reported as a HashMap regression & can be
triggered when doing a
 
join string_col = int_col;
 
with an easy workaround, cast the smaller table to the bigger table's type.
 
Cheers,
Gopal
 
 
 
 
Mime
View raw message