falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satish Mittal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FALCON-455) Replication of output feed of an HCatalog process not working
Date Thu, 29 May 2014 13:21:01 GMT
Satish Mittal created FALCON-455:
------------------------------------

             Summary: Replication of output feed of an HCatalog process not working
                 Key: FALCON-455
                 URL: https://issues.apache.org/jira/browse/FALCON-455
             Project: Falcon
          Issue Type: Bug
    Affects Versions: 0.5
            Reporter: Satish Mittal


Suppose there is an HCatalog process (java type) that takes an HCat input feed and outputs
another HCat feed. Further, this output feed is configured for replication across 2 clusters.

The replication of output feed fails during Hive import step. The reason is that HCat process
job output on HDFS consists of '_logs' directory if process writes to a static partition (or
consists of an empty '_temporary' directory if process writes to a dynamic partition). 

The Hive import job logs contain following error:

{noformat}
9036 [main] INFO  org.apache.hadoop.hive.ql.Driver  - Starting command: 
import table table5 partition (minute='25',month='05',year='2014',hour='12',day='29') from
'hdfs://databusdev2.mkhoj.com:9000//projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data'
9036 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=TimeToSubmit
start=1401367057244 end=1401367057579 duration=335 from=org.apache.hadoop.hive.ql.Driver>
9036 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=runTasks
from=org.apache.hadoop.hive.ql.Driver>
9036 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.COPY.Stage-0
from=org.apache.hadoop.hive.ql.Driver>
9036 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying data from hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25
to hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
9069 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/_SUCCESS
9096 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/_logs
9190 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/part-r-00000
9222 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.DDL.Stage-1
from=org.apache.hadoop.hive.ql.Driver>
9580 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=task.COPY.Stage-0
start=1401367057579 end=1401367058123 duration=544 from=org.apache.hadoop.hive.ql.Driver>
9580 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.MOVE.Stage-2
from=org.apache.hadoop.hive.ql.Driver>
9581 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Loading data to table default.table5
partition (day=29, hour=12, minute=25, month=05, year=2014) from hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
9598 [main] INFO  org.apache.hadoop.hive.ql.exec.MoveTask  - Partition is: {day=29, hour=12,
minute=25, month=05, year=2014}
9668 [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Failed with exception checkPaths:
hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000/_logs
org.apache.hadoop.hive.ql.metadata.HiveException: checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000/_logs
	at org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2108)
	at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2298)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1230)
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:408)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1532)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1305)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1136)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:976)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:966)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:457)
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:467)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
	at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:318)
	at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:279)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
	at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
	at org.apache.hadoop.mapred.Child.main(Child.java:260)

9668 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=task.MOVE.Stage-2
start=1401367058123 end=1401367058211 duration=88 from=org.apache.hadoop.hive.ql.Driver>
9672 [main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Execution Error, return code
1 from org.apache.hadoop.hive.ql.exec.MoveTask
{noformat}

Apprarently, Hive import doesn't like any directory in import path. This behavior can be seen
on Hive CLI also.

{noformat}
hive> import table table5 partition (minute='32',month='05',year='2014',hour='12',day='29')
from 'hdfs://databusdev2.mkhoj.com:9000//projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data'
    > ;
Copying data from hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32
Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/_SUCCESS
Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/_logs
Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/part-r-00000
Loading data to table default.table5 partition (day=29, hour=12, minute=32, month=05, year=2014)
Failed with exception checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-hive/hive_2014-05-29_13-13-43_867_8757094482694632648-1/-ext-10000
has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-hive/hive_2014-05-29_13-13-43_867_8757094482694632648-1/-ext-10000/_logs
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
hive>
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message