falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satish Mittal (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FALCON-357) HCatalog Feed replication: Hive export job fails when table partition contains multiple dated columns
Date Tue, 18 Mar 2014 06:36:42 GMT

    [ https://issues.apache.org/jira/browse/FALCON-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938885#comment-13938885
] 

Satish Mittal edited comment on FALCON-357 at 3/18/14 6:35 AM:
---------------------------------------------------------------

Here is the feed xml:

{code}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="hcat-in-repl" description="input" xmlns="uri:falcon:feed:0.1">
    <groups>hcatinputgroup</groups>
    <frequency>minutes(30)</frequency>
    <timezone>UTC</timezone>
    <late-arrival cut-off="hours(1)"/>
    <clusters>
        <cluster name="hcat-cluster1" type="source">
            <validity start="2014-03-11T09:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>
            <table uri="catalog:default:table3#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"/>
        </cluster>
        <cluster name="hcat-cluster2" type="target">
            <validity start="2014-03-11T09:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>
            <table uri="catalog:default:table4#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"/>
        </cluster>
    </clusters>
    <table uri="catalog:default:table3#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"/>
    <ACL owner="testuser" group="group" permission="0x644"/>
    <schema location="/schema/log/log.format.csv" provider="csv"/>
</feed>
{code}

Both HCatalog tables  - table3 and table4 are partitioned by (year, month, day, hour, minute).
As part of Hive export during feed replication, the value of falconSourcePartition comes out
to: 

falconSourcePartition=(minute='00' AND month='03' AND year='2014' AND hour='09' AND day='11')

The above partition value fails on Hive CLI as well:

{code}
hive> export table default.table3 partition (minute='00' AND month='03' AND year='2014'
AND hour='09' AND day='11') to 'hdfs://hostname:9000//projects/falcon/hcluster1/staging/FALCON_FEED_REPLICATION_hcat-in-repl5_hcat-cluster1/default/table3/year=2014/2014-03-03-10-31/data';

FAILED: ParseException line 1:51 mismatched input 'AND' expecting ) near ''00'' in export
statement
{code}

If I change the format of partition value to following, it works:

{code}
hive> export table default.table3 partition (year='2014',month='03',day='04',hour='06',minute='00')
to 'hdfs://hostname:9000//projects/falcon/hcluster1/staging/FALCON_FEED_REPLICATION_hcat-in-repl5_hcat-cluster1/default/table3/year=2014/2014-03-03-10-31/data';
Copying data from file:/tmp/hive/hive_2014-03-04_06-51-08_831_713233878547135455-1/-local-10000/_metadata
Copying file: file:/tmp/hive/hive_2014-03-04_06-51-08_831_713233878547135455-1/-local-10000/_metadata
Copying data from hdfs://hostname:9000/var/hcat/hcluster1/out3/2014/03/04/06/00
Copying file: hdfs://hostname:9000/var/hcat/hcluster1/out3/2014/03/04/06/00/_temporary
Copying file: hdfs://hostname:9000/var/hcat/hcluster1/out3/2014/03/04/06/00/part-r-00000
OK
{code}


was (Author: satish.mittal):
Here is the feed xml:

{code}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="hcat-in-repl" description="input" xmlns="uri:falcon:feed:0.1">
    <groups>hcatinputgroup</groups>
    <frequency>minutes(30)</frequency>
    <timezone>UTC</timezone>
    <late-arrival cut-off="hours(1)"/>
    <clusters>
        <cluster name="hcat-cluster1" type="source">
            <validity start="2014-03-11T09:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>
            <table uri="catalog:default:table3#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"/>
        </cluster>
        <cluster name="hcat-cluster2" type="target">
            <validity start="2014-03-11T09:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>
            <table uri="catalog:default:table4#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"/>
        </cluster>
    </clusters>
    <table uri="catalog:default:table3#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"/>
    <ACL owner="testuser" group="group" permission="0x644"/>
    <schema location="/schema/log/log.format.csv" provider="csv"/>
</feed>
{code}

Both HCatalog tables  - table3 and table4 are partitioned by (year, month, day, hour, minute).
As part of Hive export during feed replication, the value of falconSourcePartition comes out
to: 

falconSourcePartition=(minute='00' AND month='03' AND year='2014' AND hour='09' AND day='11')

The above partition value fails on Hive CLI as well:

{code}
hive> export table default.table3 partition (year=2014,month=03,day=04,hour=06,minute=00)
to 'hdfs://hostname:9000//projects/falcon/hcluster1/staging/FALCON_FEED_REPLICATION_hcat-in-repl5_hcat-cluster1/default/table3/year=2014/2014-03-03-10-31/data';
FAILED: SemanticException [Error 10006]: Line 1:39 Partition not found '00'
{code}

If I change the format of partition value to following, it works:

{code}
hive> export table default.table3 partition (year='2014',month='03',day='04',hour='06',minute='00')
to 'hdfs://hostname:9000//projects/falcon/hcluster1/staging/FALCON_FEED_REPLICATION_hcat-in-repl5_hcat-cluster1/default/table3/year=2014/2014-03-03-10-31/data';
Copying data from file:/tmp/hive/hive_2014-03-04_06-51-08_831_713233878547135455-1/-local-10000/_metadata
Copying file: file:/tmp/hive/hive_2014-03-04_06-51-08_831_713233878547135455-1/-local-10000/_metadata
Copying data from hdfs://hostname:9000/var/hcat/hcluster1/out3/2014/03/04/06/00
Copying file: hdfs://hostname:9000/var/hcat/hcluster1/out3/2014/03/04/06/00/_temporary
Copying file: hdfs://hostname:9000/var/hcat/hcluster1/out3/2014/03/04/06/00/part-r-00000
OK
{code}

> HCatalog Feed replication: Hive export job fails when table partition contains multiple
dated columns
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-357
>                 URL: https://issues.apache.org/jira/browse/FALCON-357
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Satish Mittal
>            Assignee: Satish Mittal
>
> Suppose a falcon feed is based on an HCatalog table partitioned by (year, month, day,
hour, minute). The feed replication fails during hive export with the following error:
> Intercepting System.exit(40000)
> Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code
[40000]
> stderr logs:
> FAILED: ParseException line 1:51 mismatched input 'AND' expecting ) near ''00'' in export
statement



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message