hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Yanlin" <yac...@ea.com>
Subject Disc out of space error
Date Fri, 13 Jun 2014 15:40:54 GMT
Hi,

One of my job keeps facing FSError: java.io.IOException: No space left on device with some
tasks fail with org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for output/file.out at .... on Host node72-142.prod-aws.eadpdata.ea.com
OR org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory
for attempt_201405211957_566618_m_000001_0/intermediate.34 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at ...

The nodes failed the tasks don't look that full and the stats for this job is attached below.
The job is doing a self inner join in the subquery then do some aggregation.

Does anybody possibly know what's the reason the job fails on space issue while we still have
some space?
And is there any way to optimize the query itself besides the space cleanup?

Thanks a lot!


SET mapred.max.split.size=134217728;
SET mapred.min.split.size.per.node=100000000;
SET mapred.min.split.size.per.rack=100000000;

CREATE EXTERNAL TABLE IF NOT EXISTS mpst.score_per_min_v2
(
game_name STRING,
hosted_platform STRING,
s_kit STRING,
vehicle STRING,
score_amt FLOAT,
min_spent FLOAT,
score_per_min FLOAT
)
PARTITIONED BY (load_datetime STRING)
STORED AS RCFILE
LOCATION '/hive/warehouse/mpst/score_per_min_v2';

INSERT OVERWRITE TABLE score_per_min_v2 PARTITION(load_datetime='2014-06-09 23-58-00')
SELECT game_name, hosted_platform,
CASE WHEN s_kit IS NOT NULL THEN s_kit ELSE "NA" END AS s_kit,
vehicle,
SUM(score_amt),
SUM(time_duration/60) AS min_spent,
CASE WHEN SUM(time_duration/60)=0 THEN 0.0 ELSE round(SUM(score_amt)/SUM(time_duration/60),2)
END AS score_per_min
FROM
(
SELECT
c.round_guid AS round_guid,
c.persona_id AS persona_id,
c.player_id AS player_id,
c.round_start_datetime AS round_start_datetime,
c.s_kit AS s_kit,
c.vehicle AS vehicle,
a.round_time AS start_time,
c.round_time AS end_time,
(c.round_time - a.round_time) AS time_duration,
c.score_amt,
c.hosted_platform,
c.game_name
FROM
mpst.spm_stg_v2 c
INNER JOIN
mpst.spm_stg_v2 a
ON
a.dt= '2014-06-10' AND c.dt = '2014-06-10' AND a.dt = c.dt AND a.service = c.service AND a.hour
= c.hour
AND a.round_guid = c.round_guid AND a.player_id = c.player_id AND a.hosted_platform = c.hosted_platform
AND a.persona_id = c.persona_id AND a.player_id = c.player_id AND a.round_start_datetime =
c.round_start_datetime AND a.rank = (c.rank - 1)
) x
GROUP BY game_name, hosted_platform, s_kit, vehicle;


Map-Reduce Framework

Map output materialized bytes

173,033,990,918

0

173,033,990,918

Map input records

555,343,308

0

555,343,308

Reduce shuffle bytes

0

173,033,990,918

173,033,990,918

Spilled Records

4,188,988,304

1,350,009,594

5,538,997,898

Map output bytes

169,705,718,344

0

169,705,718,344

Total committed heap usage (bytes)

3,002,007,552

553,385,984

3,555,393,536

CPU time spent (ms)

26,347,260

10,932,050

37,279,310

Map input bytes

1,275,536,063

0

1,275,536,063

SPLIT_RAW_BYTES

13,493

0

13,493

Combine input records

0

0

0

Reduce input records

0

1,110,686,616

1,110,686,616

Reduce input groups

0

1,110,686,616

1,110,686,616

Combine output records

0

0

0

Physical memory (bytes) snapshot

3,628,310,528

493,240,320

4,121,550,848

Reduce output records

0

0

0

Virtual memory (bytes) snapshot

21,354,807,296

4,420,263,936

25,775,071,232

Map output records

1,110,686,616

0

1,110,686,616



Regards,

Y. Chen
--- Perspiration never betray you ---



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message