hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: Hive map join - process a little larger tables with moderate number of rows
Date Thu, 31 Mar 2011 23:25:03 GMT
You possibly got a OOM error when processing the small tables. OOM is
a fatal error that can not be controlled by the hive configs. So can
you try to increase your memory setting?

thanks
yongqiang
On Thu, Mar 31, 2011 at 7:25 AM, Bejoy Ks <bejoy_ks@yahoo.com> wrote:
> Hi Experts
>     I'm currently working with hive 0.7 mostly with JOINS. In all
> permissible cases i'm using map joins by setting the
> hive.auto.convert.join=true  parameter. Usage of local map joins have made a
> considerable performance improvement in hive queries.I have used this local
> map join only on the default set of hive configuration parameters now i'd
> try to dig more deeper into this. Want to try out this local map join on
> little bigger tables with more no of rows. Given below is a failure log of
> one of my local map tasks and in turn executing its back up common join task
>
> 2011-03-31 09:56:54     Starting to launch local task to process map
> join;      maximum memory = 932118528
> 2011-03-31 09:56:57     Processing rows:        200000  Hashtable size:
> 199999  Memory usage:   115481024       rate:   0.124
> 2011-03-31 09:57:00     Processing rows:        300000  Hashtable size:
> 299999  Memory usage:   169344064       rate:   0.182
> 2011-03-31 09:57:03     Processing rows:        400000  Hashtable size:
> 399999  Memory usage:   232132792       rate:   0.249
> 2011-03-31 09:57:06     Processing rows:        500000  Hashtable size:
> 499999  Memory usage:   282338544       rate:   0.303
> 2011-03-31 09:57:10     Processing rows:        600000  Hashtable size:
> 599999  Memory usage:   336738640       rate:   0.361
> 2011-03-31 09:57:14     Processing rows:        700000  Hashtable size:
> 699999  Memory usage:   391117888       rate:   0.42
> 2011-03-31 09:57:22     Processing rows:        800000  Hashtable size:
> 799999  Memory usage:   453906496       rate:   0.487
> 2011-03-31 09:57:27     Processing rows:        900000  Hashtable size:
> 899999  Memory usage:   508306552       rate:   0.545
> 2011-03-31 09:57:34     Processing rows:        1000000 Hashtable size:
> 999999  Memory usage:   562706496       rate:   0.604
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapredLocalTask
> ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask
> Launching Job 4 out of 6
>
>
> Here i"d like to make this local map task running, for the same i tried
> setting the following hive parameters as
> hive -f  HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf
> hive.mapjoin.smalltable.filesize=40000000 -hiveconf
> hive.auto.convert.join=true
> Butting setting the two config parameters doesn't make my local map task
> proceed beyond this stage.  I didn't try out
> overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my
> task log shows that the memory usage rate is just 0.604, so i assume setting
> the same with a larger value wont cater to a solution in my case.Could some
> one please guide me what are the actual parameters and the values I should
> set to get things rolling.
>
> Thank You
>
> Regards
> Bejoy.K.S
>
>

Mime
View raw message