hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Kousiouris <gkous...@mail.ntua.gr>
Subject Re: Problem with MR job
Date Wed, 21 Sep 2011 14:35:33 GMT

Hi,

Some more logs, specifically from the JobTracker:

2011-09-21 10:22:43,482 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201109211018_0001
2011-09-21 10:22:43,538 ERROR org.apache.hadoop.mapred.JobHistory: 
Failed creating job history log file for job job_201109211018_0001
java.io.FileNotFoundException: 
/usr/lib/hadoop-0.20/logs/history/master_1316614721548_job_201109211018_0001_hdfs_Input+Driver+running+over+input%3A+hdfs%3A%2F%2Fmaster%2Fuse

(P$
         at java.io.FileOutputStream.open(Native Method)
         at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
         at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:189)
         at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:185)
         at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:243)
         at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:336)
         at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:369)
         at 
org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:1223)
         at 
org.apache.hadoop.mapred.JobInProgress$3.run(JobInProgress.java:681)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
         at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:678)
         at 
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4013)
         at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
2011-09-21 10:22:43,666 ERROR org.apache.hadoop.mapred.JobHistory: 
Failed to store job conf in the log dir
java.io.FileNotFoundException: 
/usr/lib/hadoop-0.20/logs/history/master_1316614721548_job_201109211018_0001_conf.xml 
(Permission denied)
         at java.io.FileOutputStream.open(Native Method)
         at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
         at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:189)
         at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:185)
         at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:243)
         at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:336)
         at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:369)


On 9/21/2011 5:15 PM, George Kousiouris wrote:
>
> Hi,
>
> The status seems healthy and the datanodes live:
> Status: HEALTHY
>  Total size:    118805326 B
>  Total dirs:    31
>  Total files:    38
>  Total blocks (validated):    38 (avg. block size 3126455 B)
>  Minimally replicated blocks:    38 (100.0 %)
>  Over-replicated blocks:    0 (0.0 %)
>  Under-replicated blocks:    9 (23.68421 %)
>  Mis-replicated blocks:        0 (0.0 %)
>  Default replication factor:    1
>  Average block replication:    1.2368422
>  Corrupt blocks:        0
>  Missing replicas:        72 (153.19148 %)
>  Number of data-nodes:        2
>  Number of racks:        1
> FSCK ended at Wed Sep 21 10:06:17 EDT 2011 in 9 milliseconds
>
>
> The filesystem under path '/' is HEALTHY
>
> The jps command has the following output:
> hdfs@master:~$ jps
> 24292 SecondaryNameNode
> 30010 Jps
> 24109 DataNode
> 23962 NameNode
>
> Shouldn't this have two datanode listings? In our system, one of the 
> datanodes and the namenode is the same machine, but i seem to remember 
> that in the past even with this setup two datanode listings appeared 
> in the jps output.
>
> Thanks,
> George
>
>
>
>
> On 9/21/2011 5:08 PM, Uma Maheswara Rao G 72686 wrote:
>> Hi,
>>
>>   Any cluster restart happend? ..is your NameNode detecting DataNodes 
>> as live?
>>   Looks DNs did not report anyblocks to NN yet. You have 13 blocks 
>> persisted in NameNode namespace. At least 12 blocks should be 
>> reported from your DNs. Other wise automatically it will not come out 
>> of safemode.
>>
>> Regards,
>> Uma
>> ----- Original Message -----
>> From: George Kousiouris<gkousiou@mail.ntua.gr>
>> Date: Wednesday, September 21, 2011 7:29 pm
>> Subject: Problem with MR job
>> To: "common-user@hadoop.apache.org"<common-user@hadoop.apache.org>
>>
>>> Hi all,
>>>
>>> We are trying to run a mahout job in a hadoop cluster, but we keep
>>> getting the same status. The job passes the initial mahout stages
>>> and
>>> when it comes to be executed as a MR job, it seems to be stuck at
>>> 0%
>>> progress. Through the UI we see that it is submitted but not
>>> running.
>>> After a while it gets killed. In the logs the error shown is this one:
>>>
>>> 2011-09-21 07:47:50,507 INFO org.apache.hadoop.mapred.JobTracker:
>>> problem cleaning system directory:
>>> hdfs://master/var/lib/hadoop-0.20/cache/hdfs/mapred/system
>>> org.apache.hadoop.ipc.RemoteException:
>>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
>>> create
>>> directory /var/lib/hadoop-0.20/cache/hdfs/mapred/system. Name nod$
>>> The reported blocks 0 needs additional 12 blocks to reach the
>>> threshold
>>> 0.9990 of total blocks 13. Safe mode will be turned off automatically.
>>>          at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:1966)

>>>
>>>          at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:1940)

>>>
>>>          at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:770) 
>>>
>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

>>>
>>>          at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>>
>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>>
>>> Some staging files seem to have been created however.
>>>
>>> I was thinking of sending this to the mahout mailing list but it
>>> seems a
>>> more core hadoop issue.
>>>
>>> We are using the following command to launch the mahout example:
>>> ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> --input hdfs://master/user/hdfs/testdata/synthetic_control.data --
>>> output
>>> hdfs://master/user/hdfs/testdata/output --t1 0.5 --t2 1 --maxIter 50
>>>
>>> Any clues?
>>> George
>>>
>>> -- 
>>>
>>> ---------------------------
>>>
>>> George Kousiouris
>>> Electrical and Computer Engineer
>>> Division of Communications,
>>> Electronics and Information Engineering
>>> School of Electrical and Computer Engineering
>>> Tel: +30 210 772 2546
>>> Mobile: +30 6939354121
>>> Fax: +30 210 772 2569
>>> Email: gkousiou@mail.ntua.gr
>>> Site: http://users.ntua.gr/gkousiou/
>>>
>>> National Technical University of Athens
>>> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>>>
>>>
>>
>
>


-- 

---------------------------

George Kousiouris
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: gkousiou@mail.ntua.gr
Site: http://users.ntua.gr/gkousiou/

National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece


Mime
View raw message