Return-Path:
=A0=A0public static void configureIncrementalLoad(Jo=
b job, HTable table) =A0 throws IOException { =A0 =A0 Configuration conf =3D job.getConfiguration();
=A0 =A0=A0Path partitionsPath = =3D new Path(job.getWorkingDirectory(),
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 "partitions_" + UUID.random= UUID());
=A0 =A0 LOG.info("Writing partition information to = " + partitionsPath);
=A0 =A0 FileSystem fs =3D partitionsPath.getFileSystem(conf= );
=A0 =A0 writePartitions(conf, partitionsPath, startKeys);= p>
=A0 =A0 partitionsPath.makeQualified(fs);
Can you check whether hdfs related config was passed to Job correctly ?= p>
Thanks
Ok, a bit more info- =A0From what I can te= ll is that the partitions file is being placed into the working dir on the = node I launch from, and the task trackers are trying to look for that file,= which doesn't exist where they run (since they are on other nodes.)
Here is the exception on the TT in case it= is helpful:
2013-02-06 17:05:13,002 WARN org.apache.hadoop.mapred.TaskTracker: Exceptio= n while localization java.io.FileNotFoundException: File /opt/jobs/MyMapred= uceJob/partitions_1360170306728 does not exist.=A0 =A0 =A0 =A0 at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(Ra= wLocalFileSystem.java:397)=A0 =A0 =A0 =A0 at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(Filt= erFileSystem.java:251)=A0 =A0 =A0 =A0 at org.apache.hadoop.filecache.TaskDistributedCacheManager.= setupCache(TaskDistributedCacheManager.java:179)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.j= ava:1212)=A0 =A0 =A0 =A0 at java.security.AccessController.doPrivileged(Native Metho= d)=A0 =A0 =A0 =A0 at javax.security.auth.Subject.doAs(Subject.java:396)=A0 =A0 =A0 =A0 at org.apache.hadoop.security.UserGroupInformation.doAs(Use= rGroupInformation.java:1121)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskT= racker.java:1203)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTra= cker.java:1118)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.j= ava:2430)=A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)
From: Sean McNamara <sean.mcnamara@webtren= ds.com>
Reply-To: "user@hadoop.apache.org" &= lt;user@hadoop.= apache.org>
Date: Wednesday, February 6, 2013 9= :35 AM
To: "user@hadoop.apache.org" <user@hadoop.apache= .org>
Subject: Re: TaskStatus Exception u= sing HFileOutputFormat
>=A0Using the below construct, do you still get exception ?
Correct, I am still getting this exception.
Sean
From: Ted Yu <yuzhihong@gmail.com>
Reply-To: "user@hadoop.apache.org" &= lt;user@hadoop.= apache.org>
Date: Tuesday, February 5, 2013 7:5= 0 PM
To: "user@hadoop.apache.org" <user@hadoop.apache= .org>
Subject: Re: TaskStatus Exception u= sing HFileOutputFormat
Using the below construct, do you still get exception ?
Please consider upgrading to hadoop=A01.0.4
Thanks
On Tue, Feb 5, 2013 at 4:55 PM, Sean McNamara<Sean.M= cNamara@webtrends.com> wrote:
>=A0an you tell us the HBase and hadoop versions you were using ?= div>
Ahh yes, sorry I left that out:
Hadoop:=A01.0.3HBase:=A00.92.0
>=A0I guess you have used the above construct
Our code is as follows:HTable table =3D new HTable(conf, configHBaseTable);FileOutputFormat.setOutputPath(job, outputDir);HFileOutputFormat.configureIncrementalLoad(job, table);
Thanks!
From: Ted Yu <yuzhihong@gmail.com>
Reply-To: "user@hadoop.apache.org" &= lt;user@hadoop.= apache.org>
Date: Tuesday, February 5, 2013 5:4= 6 PM
To: "user@hadoop.apache.org" <user@hadoop.apache= .org>
Subject: Re: TaskStatus Exception u= sing HFileOutputFormat
Can you tell us the HBase and hadoop versions you were using ?From=A0TestHFileOutputFormat:=A0 =A0=A0HFileOutputFormat.configureIncrementalLoad(job, table);
=A0 =A0 FileOutputFormat.setOutputPath(job, outDir);
I guess you have used the above construct ?
Cheers
On Tue, Feb 5, 2013 at 4:31 PM, Sean McNamara<Sean.M= cNamara@webtrends.com> wrote:
We're trying to use=A0HFileOutputFormat for bulk hbase loading. = =A0 When using=A0HFileOutputFormat's=A0setOutputPath or=A0configureIncr= ementalLoad, the job is unable to run. =A0The error I see in the jobtracker= logs is:=A0Trying to set finish time for task attempt_201301030046_123198_= m_000002_0 when no start time is set, stackTrace is : java.lang.Exception
If I remove an references to=A0HFileOutputFormat, and use=A0FileOutput= Format.setOutputPath, things seem to run great. =A0Does anyone know what co= uld be causing the TaskStatus error when using=A0HFileOutputFormat?
Thanks,
Sean
What I see on the Job Tracker:
2013-02-06 00:17:33,685 ERROR org.apache.hadoop.mapred.TaskStatus: Try= ing to set finish time for task attempt_201301030046_123198_m_000002_0 when= no start time is set, stackTrace is : java.lang.Exception=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.TaskStatus.setFinishTime(T= askStatus.java:145)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.TaskInProgress.incompleteS= ubTask(TaskInProgress.java:670)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.JobInProgress.failedTask(J= obInProgress.java:2945)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.JobInProgress.updateTaskSt= atus(JobInProgress.java:1162)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.JobTracker.updateTaskStatu= ses(JobTracker.java:4739)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.JobTracker.processHeartbea= t(JobTracker.java:3683)=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTr= acker.java:3378)=A0 =A0 =A0 =A0 at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown= Source)=A0 =A0 =A0 =A0 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Del= egatingMethodAccessorImpl.java:25)=A0 =A0 =A0 =A0 at java.lang.reflect.Method.invoke(Method.java:597)=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)==A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.j= ava:1388)=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.j= ava:1384)=A0 =A0 =A0 =A0 at java.security.AccessController.doPrivileged(Native = Method)=A0 =A0 =A0 =A0 at javax.security.auth.Subject.doAs(Subject.java:396)<= /div>=A0 =A0 =A0 =A0 at org.apache.hadoop.security.UserGroupInformation.doA= s(UserGroupInformation.java:1121)=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.Server$Handler.run(Server.jav= a:1382)
What I see from the console:
391 =A0[main] INFO =A0org.apache.hadoop.hbase.mapreduce.HFileOutputFor= mat =A0- Looking up current regions for table org.apache.hadoop.hbase.clien= t.HTable@3a083b1b1284 [main] INFO =A0org.apache.hadoop.hbase.mapreduce.HFileOutputForma= t =A0- Configuring 41 reduce partitions to match current region count1285 [main] INFO =A0org.apache.hadoop.hbase.mapreduce.HFileOutputForma= t =A0- Writing partition information to file:/opt/webtrends/oozie/jobs/Lab/= O/VisitorAnalytics.MapReduce/bin/partitions_13601098751121319 [main] INFO =A0org.apache.hadoop.util.NativeCodeLoader =A0- Loade= d the native-hadoop library1328 [main] INFO =A0org.apache.hadoop.io.compress.zlib.ZlibFactory =A0= - Successfully loaded & initialized native-zlib library1329 [main] INFO =A0org.apache.hadoop.io.compress.CodecPool =A0- Got b= rand-new compressor1588 [main] INFO =A0org.apache.hadoop.hbase.mapreduce.HFileOutputForma= t =A0- Incremental table output configured.2896 [main] INFO =A0org.apache.hadoop.hbase.mapreduce.TableOutputForma= t =A0- Created table instance for Lab_O_VisitorHistory2910 [main] INFO =A0org.apache.hadoop.mapreduce.lib.input.FileInputFor= mat =A0- Total input paths to process : 1Job Name: =A0 =A0 =A0 job_201301030046_123199Job URL: =A0 =A0 =A0 =A0VisitorHistory MapReduce (soozie01.Lab.O)3141 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Running jo= b: job_201301030046_1231994145 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- =A0map 0% = reduce 0%10162 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_m_000002_0, Status : FAILED10196 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata01.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000002_0&filt= er=3Dstdout10199 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata01.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000002_0&filt= er=3Dstderr10199 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_r_000042_0, Status : FAILED10203 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata01.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000042_0&filt= er=3Dstdout10205 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata01.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000042_0&filt= er=3Dstderr10206 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_m_000002_1, Status : FAILED10210 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000002_1&filt= er=3Dstdout10213 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000002_1&filt= er=3Dstderr10213 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_r_000042_1, Status : FAILED10217 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000042_1&filt= er=3Dstdout10219 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000042_1&filt= er=3Dstderr10220 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_m_000002_2, Status : FAILED10224 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000002_2&filt= er=3Dstdout10226 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000002_2&filt= er=3Dstderr10227 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_r_000042_2, Status : FAILED10236 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000042_2&filt= er=3Dstdout10239 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000042_2&filt= er=3Dstderr10239 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_m_000001_0, Status : FAILED10244 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata02.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000001_0&filt= er=3Dstdout10247 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata02.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000001_0&filt= er=3Dstderr10247 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_r_000041_0, Status : FAILED10250 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata02.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000041_0&filt= er=3Dstdout10252 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata02.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000041_0&filt= er=3Dstderr11255 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_m_000001_1, Status : FAILED11259 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000001_1&filt= er=3Dstdout11262 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000001_1&filt= er=3Dstderr11262 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_r_000041_1, Status : FAILED11265 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000041_1&filt= er=3Dstdout11267 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata05.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000041_1&filt= er=3Dstderr11267 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_m_000001_2, Status : FAILED11271 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000001_2&filt= er=3Dstdout11273 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_m_000001_2&filt= er=3Dstderr11274 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Task Id := attempt_201301030046_123199_r_000041_2, Status : FAILED11277 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000041_2&filt= er=3Dstdout11279 [main] WARN =A0org.apache.hadoop.mapred.JobClient =A0- Error rea= ding task outputhttp://sdata03.staging.dmz:50060/tasklog?plain= text=3Dtrue&attemptid=3Dattempt_201301030046_123199_r_000041_2&filt= er=3Dstderr11280 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Job compl= ete: job_201301030046_12319911291 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- Counters:= 411292 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- =A0 Job C= ounters=A011292 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- =A0 =A0 S= LOTS_MILLIS_MAPS=3D011292 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- =A0 =A0 T= otal time spent by all reduces waiting after reserving slots (ms)=3D011292 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- =A0 =A0 T= otal time spent by all maps waiting after reserving slots (ms)=3D011293 [main] INFO =A0org.apache.hadoop.mapred.JobClient =A0- =A0 =A0 S= LOTS_MILLIS_REDUCES=3D0
--bcaec554d84c7f9c2a04d514f5ec--