hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elaine Gan <>
Subject help on failed MR jobs (big hive files)
Date Wed, 12 Dec 2012 09:43:31 GMT

I'm trying to run a program on Hadoop.

[Input] tsv file

My program does the following.
(1) Load tsv into hive
      load data local inpath 'tsvfile' overwrite into table A partitioned by xx
(2) insert overwrite table B select a, b, c from table A where datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))),
request_date) <= 30
(3) Running Mahout

In step 2, i am trying to retrieve data from hive for the past month.
My hadoop work always stopped here.
When i check through my browser utility it says that 

Diagnostic Info:
# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201211291541_0262_m_001800

Task attempt_201211291541_0262_m_001800_0 failed to report status for 1802 seconds. Killing!
Error: Java heap space
Task attempt_201211291541_0262_m_001800_2 failed to report status for 1800 seconds. Killing!
Task attempt_201211291541_0262_m_001800_3 failed to report status for 1801 seconds. Killing!

Each hive table is big, around 6 GB.

(1) Is it too big to have around 6GB for each hive table?
(2) I've increased by HEAPSIZE to 50G,which i think is far more than enough. Any else
where i can do the tuning?

Thank you.


View raw message