pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: running bigger pig jobs on amazon ec2
Date Wed, 08 Dec 2010 17:11:39 GMT
>From the logs it looks like issue is not with Pig but with your hdfs.
Either your hdfs is running out of space or some (or all) nodes in
your cluster can't talk to each other (network issue ?)

Ashutosh
On Wed, Dec 8, 2010 at 06:09, jr <johannes.russek@io-consulting.net> wrote:
> Hi guys,
> I'm having some trouble finished jobs that run smoothly on a smaller
> dataset, but always fail at 99% if i try to run the job on the whole
> set.
> i can see a few killed map and a few killed reduce, but quite a lot of
> failed reduce tasks that all show the same exception at the end.
> here is what i have in the logs:
>
> 2010-12-08 08:44:56,127 INFO org.apache.hadoop.mapred.ReduceTask:
> Ignoring obsolete output of KILLED map-task:
> 'attempt_201012080810_0003_m_000009_1'
> 2010-12-08 08:45:08,152 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201012080810_0003_r_000000_0:
Got 1 new map-outputs
> 2010-12-08 08:45:13,103 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201012080810_0003_r_000000_0
Scheduled 1 outputs (0 slow hosts and0 dup hosts)
> 2010-12-08 08:45:13,241 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201012080810_0003_m_000003_0,
compressed len: 3488519, decompressed len: 3488515
> 2010-12-08 08:45:13,241 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 3488515 bytes
(3488519 raw bytes) into RAM from attempt_201012080810_0003_m_000003_0
> 2010-12-08 08:45:13,348 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory
handler called (Collection threshold exceeded) init = 5439488(5312K) used = 78403496(76565K)
committed = 101908480(99520K) max = 139853824(136576K)
> 2010-12-08 08:45:13,404 INFO org.apache.hadoop.mapred.ReduceTask: Read 3488515 bytes
from map-output for attempt_201012080810_0003_m_000003_0
> 2010-12-08 08:45:13,405 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201012080810_0003_m_000003_0
-> (142, 21) from ip-10-98-71-195.ec2.internal
> 2010-12-08 08:45:14,241 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread
exiting
> 2010-12-08 08:45:14,241 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread
joined.
> 2010-12-08 08:45:14,242 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager
> 2010-12-08 08:45:14,253 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk
merge complete: 2 files left.
> 2010-12-08 08:45:14,254 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete:
64 files left.
> 2010-12-08 08:45:14,312 INFO org.apache.hadoop.mapred.Merger: Merging 64 sorted segments
> 2010-12-08 08:45:14,313 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 64 segments left of total size: 82947024 bytes
> 2010-12-08 08:45:15,389 INFO org.apache.hadoop.mapred.ReduceTask: Merged 64 segments,
82947024 bytes to disk to satisfy reduce memory limit
> 2010-12-08 08:45:15,390 INFO org.apache.hadoop.mapred.ReduceTask: Merging 3 files, 214514578
bytes from disk
> 2010-12-08 08:45:15,392 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments,
0 bytes from memory into reduce
> 2010-12-08 08:45:15,392 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted segments
> 2010-12-08 08:45:15,397 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 3 segments left of total size: 214514566 bytes
> 2010-12-08 08:45:15,489 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native
gpl library
> 2010-12-08 08:45:15,522 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded
& initialized native-lzo library [hadoop-lzo rev 3e7c9dcf0ea0acbde146cb22b236978b344c5525]
> 2010-12-08 08:45:15,530 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
> 2010-12-08 08:45:15,534 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
> 2010-12-08 08:45:15,544 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
> 2010-12-08 08:45:15,562 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
> 2010-12-08 08:45:15,564 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
> 2010-12-08 08:45:15,568 INFO com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader
created.
> 2010-12-08 08:45:37,233 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.98.99.197:50010
> 2010-12-08 08:45:37,235 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8615551403563164366_3938
> 2010-12-08 08:45:43,251 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.98.99.197:50010
> 2010-12-08 08:45:43,251 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_4074920756844442310_4023
> 2010-12-08 08:45:49,282 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.100.226.63:50010
> 2010-12-08 08:45:49,282 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-681320892856427804_4034
> 2010-12-08 08:45:55,292 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.99.26.80:50010
> 2010-12-08 08:45:55,292 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6999793088579291779_4039
> 2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_-6999793088579291779_4039 bad datanode[1] nodes == null
> 2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations.
Source file "/tmp/temp664356070/tmp-973959386/_temporary/_attempt_201012080810_0003_r_000000_0/winrar/output/extlink/2010-04/2010-04-00000"
- Aborting...
> 2010-12-08 08:46:01,656 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received error from
store function.Bad connect ack with firstBadLink 10.99.26.80:50010
>        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:140)
>        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
>        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
>        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
>        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
>        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
>        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239)
>        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.io.IOException: Bad connect ack with firstBadLink 10.99.26.80:50010
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
> 2010-12-08 08:46:01,659 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for
the task
>
> Any idea on what this problem is and how to make hadoop actually finish
> larger datasets?
> (i have to admit i have a lot of rather small files (<30MB) as input,
> but this appears to happen when the reducer tries to write it's result?
>
> regards,
> Johannes
>
>

Mime
View raw message