hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: Re: sql mapjoin very slow
Date Mon, 31 Aug 2015 02:46:06 GMT

Yes ,  I am accidentally joining on a Double. 
                  0 UDFToDouble(nav_tcdt) (type: double)
                  1 UDFToDouble(site_categ_id) (type: double)
                  2 UDFToDouble(site_categ_id) (type: double)
                  3 UDFToDouble(mg_brand_id) (type: double)
                  4 UDFToDouble(attr_detl_id) (type: double)
From: Gopal Vijayaraghavan
Date: 2015-08-29 01:45
To: user
Subject: Re: sql mapjoin very slow
> I have a question. I use hive 1.1.0 ,so hive.stats.dbclass default value
>is fs. Mean store statistics
> in local filesystem.  Any one can tell what is the  file path to store
>statistics ?
The statistics aren't stored in the file system long term - the final
destination for stats is the metastore.
The earlier default stats implementation used MR Counters. With
stats.dbclass=fs, they're passed during ETL via the FileSystem, not the MR
You'll see something like this in the ETL phase, which is just a way to
write the target table + a new location where stats for the insert is
2015-08-28T01:44:35,581 INFO  [main]: parse.SemanticAnalyzer
( - Set stats collection dir :
The StatsTask on the client side will read this file and update the
That aside, you might want to check if you're accidentally joining on a
Double. That has been recently reported as a HashMap regression & can be
triggered when doing a
join string_col = int_col;
with an easy workaround, cast the smaller table to the bigger table's type.
View raw message