falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satish Mittal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-284) Hcatalog based feed retention doesn't work when partition filter spans across multiple partition keys
Date Fri, 31 Jan 2014 12:24:09 GMT

    [ https://issues.apache.org/jira/browse/FALCON-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887687#comment-13887687

Satish Mittal commented on FALCON-284:

Suppose we have a HCatalog table table1 that is PARTITIONED BY (year STRING, month STRING,
day STRING, hour STRING, minute STRING).

And we submit a falcon feed corresponding to table1 and with a retention of 2 hours:

        <cluster name="hcat-cluster">
            <validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>

    <table uri="catalog:default:table1#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"

The feed retention jobs for this feed succeed; however the partition filter used by retention
only considers *year* in the partition filter. Here is a snippet of task log:

*2014-01-30 12:12:10,940 INFO  - List partitions for : table1, partition filter: year <
'2014' (HiveCatalogService:134)*
2014-01-30 12:12:11,519 WARN  - DEPRECATED: Configuration property hive.metastore.local no
longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are
connecting to a remote metastore. (HiveConf:1231)
2014-01-30 12:12:11,844 INFO  - Trying to connect to metastore with URI thrift://localhost:5055
2014-01-30 12:12:11,881 INFO  - Waiting 1 seconds before next connection attempt. (metastore:327)
2014-01-30 12:12:12,881 INFO  - Connected to metastore. (metastore:337)
2014-01-30 12:12:12,930 INFO  - Caching HCatalog client object for thrift://localhost:5055
2014-01-30 12:12:12,971 INFO  - No partitions to delete. (FeedEvictor:389)

> Hcatalog based feed retention doesn't work when partition filter spans across multiple
partition keys
> -----------------------------------------------------------------------------------------------------
>                 Key: FALCON-284
>                 URL: https://issues.apache.org/jira/browse/FALCON-284
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Satish Mittal
> When an HCatalog based feed is scheduled in falcon, retention only looks at the first
partition key that satisfies either of date pattern: yyyy | MM | dd | HH | mm. As a result,
it calculates a partition filter that contains only one of these patterns. However if HCatalog
table is defined in such a way that date spans across multiple partition keys (year/month/day/hour/minute),
then feed retention doesn't delete any partitions that are granular than first level (year).

This message was sent by Atlassian JIRA

View raw message