pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jr <johannes.rus...@io-consulting.net>
Subject running bigger pig jobs on amazon ec2
Date Wed, 08 Dec 2010 14:09:09 GMT
Hi guys,
I'm having some trouble finished jobs that run smoothly on a smaller
dataset, but always fail at 99% if i try to run the job on the whole
set.
i can see a few killed map and a few killed reduce, but quite a lot of
failed reduce tasks that all show the same exception at the end.
here is what i have in the logs:

2010-12-08 08:44:56,127 INFO org.apache.hadoop.mapred.ReduceTask:
Ignoring obsolete output of KILLED map-task:
'attempt_201012080810_0003_m_000009_1'
2010-12-08 08:45:08,152 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201012080810_0003_r_000000_0:
Got 1 new map-outputs
2010-12-08 08:45:13,103 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201012080810_0003_r_000000_0
Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2010-12-08 08:45:13,241 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201012080810_0003_m_000003_0,
compressed len: 3488519, decompressed len: 3488515
2010-12-08 08:45:13,241 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 3488515 bytes
(3488519 raw bytes) into RAM from attempt_201012080810_0003_m_000003_0
2010-12-08 08:45:13,348 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called (Collection threshold exceeded) init = 5439488(5312K) used = 78403496(76565K) committed
= 101908480(99520K) max = 139853824(136576K)
2010-12-08 08:45:13,404 INFO org.apache.hadoop.mapred.ReduceTask: Read 3488515 bytes from
map-output for attempt_201012080810_0003_m_000003_0
2010-12-08 08:45:13,405 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201012080810_0003_m_000003_0
-> (142, 21) from ip-10-98-71-195.ec2.internal
2010-12-08 08:45:14,241 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting
2010-12-08 08:45:14,241 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined.
2010-12-08 08:45:14,242 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager
2010-12-08 08:45:14,253 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge
complete: 2 files left.
2010-12-08 08:45:14,254 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete:
64 files left.
2010-12-08 08:45:14,312 INFO org.apache.hadoop.mapred.Merger: Merging 64 sorted segments
2010-12-08 08:45:14,313 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 64 segments left of total size: 82947024 bytes
2010-12-08 08:45:15,389 INFO org.apache.hadoop.mapred.ReduceTask: Merged 64 segments, 82947024
bytes to disk to satisfy reduce memory limit
2010-12-08 08:45:15,390 INFO org.apache.hadoop.mapred.ReduceTask: Merging 3 files, 214514578
bytes from disk
2010-12-08 08:45:15,392 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes
from memory into reduce
2010-12-08 08:45:15,392 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted segments
2010-12-08 08:45:15,397 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 3 segments left of total size: 214514566 bytes
2010-12-08 08:45:15,489 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native
gpl library
2010-12-08 08:45:15,522 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded &
initialized native-lzo library [hadoop-lzo rev 3e7c9dcf0ea0acbde146cb22b236978b344c5525]
2010-12-08 08:45:15,530 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
2010-12-08 08:45:15,534 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
2010-12-08 08:45:15,544 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
2010-12-08 08:45:15,562 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
2010-12-08 08:45:15,564 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
2010-12-08 08:45:15,568 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
2010-12-08 08:45:37,233 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.98.99.197:50010
2010-12-08 08:45:37,235 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8615551403563164366_3938
2010-12-08 08:45:43,251 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.98.99.197:50010
2010-12-08 08:45:43,251 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_4074920756844442310_4023
2010-12-08 08:45:49,282 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.100.226.63:50010
2010-12-08 08:45:49,282 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-681320892856427804_4034
2010-12-08 08:45:55,292 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.99.26.80:50010
2010-12-08 08:45:55,292 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6999793088579291779_4039
2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException:
Unable to create new block.
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6999793088579291779_4039
bad datanode[1] nodes == null
2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations.
Source file "/tmp/temp664356070/tmp-973959386/_temporary/_attempt_201012080810_0003_r_000000_0/winrar/output/extlink/2010-04/2010-04-00000"
- Aborting...
2010-12-08 08:46:01,656 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received error from store
function.Bad connect ack with firstBadLink 10.99.26.80:50010
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:140)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: Bad connect ack with firstBadLink 10.99.26.80:50010
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-12-08 08:46:01,659 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task

Any idea on what this problem is and how to make hadoop actually finish
larger datasets?
(i have to admit i have a lot of rather small files (<30MB) as input,
but this appears to happen when the reducer tries to write it's result?

regards,
Johannes


Mime
View raw message