pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Ru├ček <johannes.rus...@io-consulting.net>
Subject Re: running bigger pig jobs on amazon ec2
Date Tue, 14 Dec 2010 14:47:46 GMT
Hello Dmitriy,

thanks for the helpful questions. I'll gather all the relevant 
information when i'm going to kick off another run.
What i can answer already:

the nodes are running on 4 cpus with a load of > 19 with about ~40-50 
it's 20 nodes with one being the namenode.
the storage is just a temporary HDFS being created on the "local" disks 
when the cluster is started each month.
Yes, in fact I'm using a storefunc that writes multiple files (one for 
each "primary" key i have in the output).

i will send you the rest of the answers as soon as i gathered the needed 

Am 12.12.2010 12:18, schrieb Dmitriy Ryaboy:
> Johannes,
> I wonder if something is putting enough pressure on the datanodes that they
> are unable to ack all the write requests fast enough, causing many tasks to
> give up due to what amounts to tcp throughput collapse.
> The logs certainly seem to indicate something unhealthy happening at the DFS
> level. Bunch of questions below... I am stabbing in the dark here, as I
> don't run clusters in EC2.
> Do you have any stats on the network traffic in your cluster while this is
> happening?
> Same, but for disk/cpu utilization and similar metrics on the data nodes?
> I am curious why there's a loader being instantiated in the reducer. Can you
> send along a relevant portion of the explain plan?
> How many map tasks and reduce tasks are you running?
> How big is the cluster?
> Is the storefunc you are using doing something like writing multiple files?
> When running a cluster in EC2, what are you using for storage? S3, EBS...?
> D
> On Fri, Dec 10, 2010 at 2:53 AM, jr<johannes.russek@io-consulting.net>wrote:
>> Hello Ashutosh,
>> I'm running entirely on amazon ec2, and while i get those errors, i seem
>> to be able to access hdfs by using "hadoop fs" :/
>> regards,
>> Johannes
>> Am Mittwoch, den 08.12.2010, 09:11 -0800 schrieb Ashutosh Chauhan:
>>>  From the logs it looks like issue is not with Pig but with your hdfs.
>>> Either your hdfs is running out of space or some (or all) nodes in
>>> your cluster can't talk to each other (network issue ?)
>>> Ashutosh
>>> On Wed, Dec 8, 2010 at 06:09, jr<johannes.russek@io-consulting.net>
>> wrote:
>>>> Hi guys,
>>>> I'm having some trouble finished jobs that run smoothly on a smaller
>>>> dataset, but always fail at 99% if i try to run the job on the whole
>>>> set.
>>>> i can see a few killed map and a few killed reduce, but quite a lot of
>>>> failed reduce tasks that all show the same exception at the end.
>>>> here is what i have in the logs:

View raw message