hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit.vijayar...@gmail.com>
Subject Re: BIG jobs on YARN
Date Tue, 08 Jan 2013 19:13:52 GMT
Digging little further I saw that the problem was with
config mapreduce.jobtracker.split.metainfo.maxsize
In 2.0 documentation that config is marked as
mapreduce.*job*.split.metainfo.maxsize
while the code is refers to mapreduce.jobtracker.split.metainfo.maxsize
After setting mapreduce.jobtracker.split.metainfo.maxsize to higher value I
could get the job running.
I will open JIRA for this.

2013/1/7 Lohit <lohit.vijayarenu@gmail.com>

> It is easily reproducible. Generate 10 TB of input data using
> teragen(replication 3) and try to run terasort using that input. First
> container fails without any information in logs and job fails
>
> Lohit
>
> On Jan 7, 2013, at 6:41 AM, Robert Evans <evans@yahoo-inc.com> wrote:
>
> > We have run some very large jobs on top of YARN, but have not run into
> > this issue yet.  The fact that the job.jar was not symlinked correctly
> > makes me think this is a YARN distributed cache issue and not really an
> > input split issue.  How reproducible is this?  Does it happen every time
> > you run the job, or did it just happen once?  Could you take a look at
> the
> > node manage logs to see if anything shows issues while launching.  Sadly
> > the node manager does not log everything when downloading the application
> > and private distributed caches, so there could be an error in there where
> > it did not create the symlink and failed to fail :).
> >
> > --Bobby
> >
> > On 1/5/13 2:44 PM, "lohit" <lohit.vijayarenu@gmail.com> wrote:
> >
> >> Hi Devs,
> >>
> >> Has anyone seen issues when running big jobs on YARN.
> >> I am trying 10 TB terasort where input is 3 way replicated. This
> generates
> >> job.split and job.splitmetainfo of more than 10MB. I see that first
> >> container launched crashes without any error files.
> >> Debugging little bit I see that job.jar symlink is not created property
> >> which was strange.
> >> If I try same 10TB terasort but with input one way replicated the job
> runs
> >> fine. job.split and job.splitmetainfo is much less in this case, which
> >> makes me believe there is some kind of limit I might be hitting.
> >> I tried to set mapreduce.job.split.metainfo.maxsize to 100M, but that
> did
> >> not help.
> >> Any experience running big jobs and any related configs you guys use?
> >>
> >> --
> >> Have a Nice Day!
> >> Lohit
> >
>



-- 
Have a Nice Day!
Lohit

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message